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(57) Abstract ' ■ • ^ • < ' . ■ • r. y: a. 

A method and system for quantifying the relative abundance oif gene transcripts in a biological specimen. One embodiment of the 
method generates high-throughput^ sequence-specific analysis of multiple RNAs or their corresponding , cDN As (gene transcript imaging 
analysis). Another embodiment of die method produces a gene transcript imaging analysis by the use of high-throu^put cDNA sequence 
analysis. In addition, the gene transcript' imaging can be used to detect or diagnose a particular biological state, disease, or. condition 
which is conclated to the relative abundance of gene transcripts in a given cell or population of cells. TTie invention provides a method 
for comparing the gene transact image analysis from two or more different biological specimens in cmler to distinguish between the two 
specimens and identify one or more genes which are differentially expressed between the two q)ecimens. 
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K- i T COMPARATIVIS . GENEf. TWVKSCRIPT , j,. 

•-••^ ^ ' - *^^- P'l- ^ FIELD OF ^ INVENTION ,) rv -.k- 

^ The present invention ig . in . the f ield jpf =anolecular 
biology and computer ,;science ; ;in9re > part icular ly , , the ? 
5 present J inyeati^n- describes , . 

transcripts and; diagnosing the genetic expression of, cells 
and rt issue.. • . -t . j . .-.* • ■ > . - 

P - ' ' i ^ ' .2. i BACKGROUKD OF THE INVENTION . . , ; y, 

- ^Until^very recently, ;the history; of imolecular biology 
10 has beeh^ written one gene ^at .a. tiine. r . Scientists have, , 
observed the pell 5 s , physicali changes:^ ;i isolated^ mixtures., 
from, the cell f,or, its milieu; . purif ied proteins, sequenced 
i. proteins, and^thertefrom constructed probes to,, look for the 
. corresponding gene » ; . , _ . v ^ 

15 V. Recently, different nations have set up massive 

projects to^sequence-the billions of bases in Jthe human , 
genome. .These, projects; typically begin with, dividing the 
genome into 1 large port ions . of chromosomes and .then 
determining ,the isequences ,of . these pieces, which are then 
20 analyzed for identity with known proteins or portions^ , 
thereof, known as motifs.. -UnfQrtunately,v the .majority of 
genomic DNA does not encode proteins and though it ^is 
- postulated to. have some effect on the cell's ability to 

make protein, -- its relevance to medical . applications is not 
25 understood at this time. ; : 

A third methodology involves sequencing^ .only the 
transcripts encoding the cellular machinery actively , 
involved in making protein, namely the mRNA. The advantage 
is that the cell has already edited out all the non-coding 
30 DNA, and it is relatively easy to identify the protein- 
coding portion of the RNA. The utility of this approach 
was not immediately obvious to genomic researchers. In 
fact, when cDNA sequencing was initially proposed, the 
method was roundly denounced by those committed to genomic 
35 sequencing. For example, the head of the U.S. Human Genome 
project discounted CDNA sequencing as not valuable and 
refused to approve funding of projects. 

In this disclosure, we teach methods for analyzing 
DNA, including cDNA libraries. Based on our analyses and 
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research, we >.see each individual gene-product ks- a "pixel- 
of information, which ^relates- to the expression of that, 
and . only cthat,: gene. sWe teach herein,-n,ethdds wHereby- the 
xndividuaa ••pixelsff .of gene expression inf dmatlon^ cW^be 
5 combined 4ntp:. a single. gene. transcript i'image, - lin which 
each .of .the individual genes can be visualized 
simultaneously , and .allowing relationships ^between the ^ene 
pixels^ ta -be easily visualized .and understood.:^ v = - > 

.We further. .teach.a -new method,. which we call- electronic 
subtraction.a: >Electronic subtraction will enable the gene 
researcher to . turn a single- image: into". a^ movirig picture ■ v 
one Which describes the. temporality = or dynamics of gene - - 
expression,,., at the level of . a cell, or.a, whole.tissuei;. it 
is. that^ense of "motion" of cellular machinery on the 
15 scale of a cell or organ which constitutes the new 
. invention herein.. This..:constitutes ai: newrview into the 
process of living .cell physiology and- one -which holds great 
promise, to unveil aiid discover -new therapeutic ando 
diagnostic approaches in .medicine. . , ■ , . ; 
20 We .teach.another method, Which we call ^electronic = 

northern, " which tracks the .expression of a single gene 
across, many types of cells and, tissues 

. Nucleic , acids ,(.DNA:and iWA) .carry , within their ^ 
seguence, the hereditary: information and.are therefore 'the 
25 P molecules . Of life. . . Nu^^^^ 

living organisms including bacteria,^ fungi, -viruses, plants 
and , animals. ,it is of interest to determiife the relative 
abundance of different discrete nucleic acids in- different 
cells, tissues ^and organisms over time under various • 
30 conditions, treatments and regimes. . 

^ All :dividing cells in the human body contain the same 
set Of 23 pairs Of chromosomes. „ it is estimated that these 
autosomal and. sex chromosomes encode approximately ioo,ooo 
genes. The differences among different types of cells are 
believed to reflect the differential expression of the 
100,000 or so genes. Fundamental questions of biology 
could be answered by understanding which genes are 
transcribed and knowing the relative abundance of 
transcripts in different cells. 
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..Previously^. n:he I art has. only provided .fpr. the ^analysis 
of a few. known genes yat a time rbyr standard l molecular t r o 
biology 'techniques such as . PGR, jnorthernoblot :analysisv ror* 
other types^ of* i DNA probe analysis such as jji sitii ^ t r. 
5 hybridization J . iEach ^ of -these methods i^ allows , one -to ^analyze 
the . transcription . of only "Jgiownigenes^ and/ori.small numbers 
of genes at ; aitimei Nucl^u Acids r Res ; 19:. . ;7097-7i04 (1991); 
Nucl. Acids ctRes.. 18, 4833r42 ^ .(1990) ;.hNucl. r Acids Res,rl8> ^ 
2789-92 ,(1989):; European J, ..Neurosciencei 2 , .1063-t1073 ; i * 
10 (1990); Analytical Biochem. 182, 364-73- (1990) Genet* r^v ; 
Annals Techn. Appl;.;2, 64-,70 (1990) ; ,,GATA 8 (4 ) > 129-r33 p 
(1991).; .Proc; Natli.. Acad; -Scii USAc SS, 1696rl7P0 :(1988) ; 
NuGli Acids -Res^ 1954 (1991) ; Proc, Natl. Acad. /Sci,.. 

USA.M> 1943-47 (;1991);.; Nucl. Acids .Res. 19, . 6123-27. ; ,n 
15 (1991) ;, Proc. Natl, r Acad. ..Sci. USA- 85; 5738-42 (1988) 
Nucl. Acids ;Res* i6r ' 10937i (.1988) ; , . , 

Studies of the-number^ and typesi of genes-t whose 
transcription is induced or otherwise regulated during cell 
processes i such as activation, . differentiation, aging, viral 
20 transformation, ..morphogenesis, and mitosis have been : 

pursued for many, years,, using a variety of methodologies. 
One of the earliest methods was, to isolate and analyze- 
levels of the proteins in a. cell, tissue, -organ system/ or 
even organisms both before and after the process, of ^ 
25 interest.- -One method of analyzings multiple proteins -in;. a^. 
sample is using 2-dimensional gel electrophoresis, wherein 
proteins-can be, in principle, identified and quantified as 
individual bands, and ultimately reduced to a discrete 
signal. At present, 2rdimensional analysis, only resolves 
30 approximately 15% of the proteins. v In order to positively 
analyze those bands which are resolved,, each band must be 
excised from the membrane and subjected to protein sequence 
analysis using Edman degradation. Unfortunately, most of 
the bands were present in quantities too small to obtain a 
35 reliable sequence, and many of those bands contained more 
than one discrete protein. An additional difficulty is-: 
that many of the proteins were blocked at the 
amino-terminus, further complicating the sequencing 
process. 
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Analyzing .dif ferentiation at =the:. crene -taranscription. i 
.level has,. overcome many of the^e disadv.antages andj . . .\ 
drawbacks, since the„power qf; i-iBcpmbinant tDNA teclmology. ; 
allQws amplification of signals, .containing. yery small 
f5 .-ampunts., of . material;., ,Th.e..most common method, called 

"hybridization subtraction, " involves Isolation- of. mBNA ■ . 
from, the. biological .specimen before (B). and after .(A), the ^ 
.developmental . process of., interest, .tr^anscribing. one .set of 
mRNA into. cDNA, .subtracting..specimen;B,from- specimen A 
10 ,:(mRNA- from.^DNA) by hybridizationc^. -aiid :constructing;^a.cDNA 
.library .f rom^,the., non-hybridi2ing,-mRNA : f raction,. . .Many . ^ 
4i,fferent,groups: have .used .this, strategy.successfully, . and; 
a variety of- procedures hay^, been published andr improved 
upon using this, same bas^c. scheme. vNucl.r. Acids Res. 19 
15 .7097.-7104 , (1991),;., Nucl.. Acids Res. M,. 4833-42 1(1990) ' 
• N.UC1.. Acids. Res.^ifl',. 27.89-92. .(1989:) European. J.V . - , 
Neuroscience 2, : 1063.1073 ,.(19?0) .; Analytical Biochem. Ml, 
364-73 ,(1990J7. Genet. Annals Techn. Appl. ...,64r70 (1990); 
GATA 1(4), .,129^33, (1991); Proc. Natl: Acad. ; Sci. USA-SS . 
1696-17,00 (1988) ; Nucl. Acids Res. 19. 1954 ^ (1991) ; -Proc... 
Natl. Acad. Sci., USA ..M,.1943t47 (1991) ;, Nucl. Acids Res. 
19, 6123-27. (1991) ; Proc- NatL. Acad. -.Sci. -, USA ,5738^42 
<1988) ; Nucl., Acids Res, 16, 10937 (1988). 

Although each of these, techniques have. particular ... 
strengths. and= weaknesses, there are still- some limitations 
and.undesirable aspects Of these methods: First, ,the time 
and effort, required to construct such . libraries is- quite 
large. .Typically, a trained molecular biologist- might 
expect construction and characterization of such, a library 
to require 3 to 6 months, depending on the level of skill 
experience, and luck. Second, the resulting subtraction ' 
libraries are typically inferior to. the libraries 
constructed by standard methodology., a typical 
conventional cDNA . library should have a done complexity of 
35 at..least 10« clones, and an average insert size of 1-3 kB 
in contrast, subtracted libraries can have complexities of 
10 or 10' and average insert sizes of 0.2 kB. Therefore, 
there can be a significant loss of clone and sequence 
information associated with such libraries. Third, this 
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approach - allows "the researcher* tto capture (Only the . genes 
induced in specimen A* relative to. specimeny Bv not - ^ . . v„ 
vice-versa, nor does it easily allow comparison to a third 
specimen of interest ^?(C) ^ ^Pdur*h^;^ !this approach 
5 very large amounts (hundreds ofivmicrograms) of "driver" 
^ mRNA (specimen =B) > which significantly dimits the nxanber . 
and 'type of : subtractions that are possible since many 
tissues and ^cells are i very, difficult, to obtain in .large 
quantities'.*- "vc U. ^ i : o-. :..,r;- i, -v. : 

10 ^ ■ Fifth, the resolution- ofj the subtraction is^ (dependent 
* upon the physical prGperti^es of vONA:DNA or RNA:DNA : . j ar 
hybridization. The i ability of ^a givein sequence to find a 
hybridization taatch is dependent on its ^unique ; CoT yalue. 
The CoT value ds a- function of . the iiumber iof : copies 
15 (concentration) of vthe particular sequence^ multiplied by 
the time of hybridization . >It follows that for sequences 
which are abundant, hybridization events will occur very ^ 
rapidly (low: CoT value) , while rare sequences will yfoma ^ . 
duplexes at very high CoT values. CoT values which allow 
20 such rare sequences to form duplexes and ;theref ore be tr u 
' effectively selected are difficult to: achievevin a 
convenient time frame. : Therefore, ^hybridization 
subtraction is simply not a useful technique with which to 
study relative levels of rare mRNA species. Sixth, this o 
25 problem is further complicated rby the fact that duplex 

formation is also dependent on thev nucleotide base ^ y 
composition for a given sequence. Those sequences rich in 
G + C form. stronger duplexes than those with high contents 
of A + T. Therefore, the former sequences will tend to be 
30 removed selectively by hybridization subtraction. Seventh, 
it is possible that hybridization between nonexact matches 
can occur. When this happens, the expression of a 
homologous gene may "mask" expression of a gene of 
interest, artificially skewing the results for that 
35 particular gene. 

Matsubara and Okubo proposed using partial cDNA 
sequences to establish expression profiles of genes which 
could be used in functional analyses of the human genome. 
Matsubara and Okubo warned against using random priming, as 
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. It creates, multiple-junique .DNA> rf ragjnents;. from/individual 
,inRNAs .and may thus, skew:, the .analysis; of,, the number «f:: 
.particular, mBNAs ,per,.li:brar;y., - ..They,. sequenced .randomly 
.■5.elected,.:members from a 3 '-directed cDNA library and 
:5 esta^li^hea-. the frequency ,of .appearance of. =the. various- 
.ESTs. , They.,pr,opoeed .comparing diists of.-ESTs from various 
cell .,t^ype.s .to; classifyi ;genes.., ;Genes expresse*.-in many.^-- 
different .cell, types, were labeled Housekeepers and those- 
.selectively expressed in certain cells were labeled cell- 
?.0 specific genes, even in the absence of the full sequence of 
the gene or the biological, :aGtivdty..of:..thei^ 

A .present^ invention, avoids? ^the. , drawbacks! of the 
.prior , art, by providing: a. method, to. quantifyothe. relative 
abundance,,. of; . multiple , gene ^transcripts in a given 
biological, specimen byythe use of .high-throughpufc; v. 
sequence-speciflc,analysis.of. .individual RNAs .and/or their 
corresppnd.ing..cDNAs.v., .-v..- 

The. present invention off ers . several advantages.. over . 
current protein. discovery, .methods which attempt to isolate 
individual proteins based .upon, biological -effects. The 
method of the. instant invention provides for -detailed .-. v. 
diagnostic .comparisons of cell profiles revealing numerous 
changes in the .expression of individual.transcripts. v ... 

The instant invention provides several advantages over 
current subtraction methods including a more complex 
library analysis .(lo^ to, lo' clones ,as compared, to. lo' 
clones)^ which allows identification of low abundance 
messages as well as. enabling the identification of messages 
Which either increase, or . decrease in .abundance. These 
large libraries are very routine to ™ake in contrast to the 
libraries of previous methods, m addition., homologues can 
easily be distinguished with the method of the instant 
invention. 

This method is very convenient because it organizes a 
large quantity of data into a comprehensible, digestible 
format. The most significant differences are -highlighted 
by electronic subtraction, m depth analyses are made more 
convenient. 
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The present invention provides several -advantages over 
previous laethods of electronic analysis of-cDNA. " The 
method is 'particularly powerful when roore than" il^ 
preferably TQore;than 1,000 gene transcripts are analyzed. 
5 In such <a casev itew 'low-frequency transcripts are m( j ■i . 
discovered tand' tissue-' typed.;. ■ ::^av^:^ ^ . rrv. 
' ' i High vresolut i^n .analysis of c gene expression can be oi 
used -directly as a diagnostic .prof ile . or to identify : 
disease-rspeclf ic genes for the development of more classic 
rO' diagnostic approaches a ^; r ; uL.. r • c; : h c. 

. ' This process is .defined -as gene transcript frequency 
analysis • The resulting : quantitative ana lysis of -the gene 
transcripts is^defined as comparative gene transcript: 
analysis^ ^--.-..rw ir-..- rtn-; f - -.v 

15 3. SUMMARY OF THE INVENTION; . : 

The -invention is a method of analyzing a ^specimen . 
containing .gene transcripts comprising i the steps of (a) 
producing a library of biological sequences; (b) generating 
^ a set of transcript sequences/ where -each of -the transcript 

20 sequences in said-iset is indicative of a -different one ^of 
the biological sequences of the library; i(c) processing the 
transcript sequences in a programmed f computer (in which a 
database of reference transcript sequences indicative of 
reference sequences is stored) > to generaterian identified 

25 sequence value for each of the transcript sequences , where 
each said identified sequence value is indicative of . / 
sequence annotation and a degree of match between one of 
the biological sequences of the library and ^ at least one of 
the reference sequences; and (d) processing each said 

30 identified sequence value to generate final data values. 

indicative of the number of times each identified sequence 
value is present in the library. 

The invention also includes a method of comparing two 
specimens containing gene transcripts. The first specimen 

35 is processed as described above. The second specimen is 
used to produce a second library of biological sequences, 
which is used to generate a second set of transcript 
sequences, where each of the transcript sequences in the 
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second set isv;indicative-of one/Of the.- biological sec[uences 
of ^^the vsecond library vTben the^ isecorid rset iof ^transcript 
sequences^ is pr.Qcessed in. a prograiamed computer to generate, 
assecondyset of. .identified sequence values, i nain.ely the 
5 further Identified sequence:, values;, each of which is 

indicative of .a. sequence annotationrand; includes .a degree 
of t^match].ibetween^one: of ..the biological ^sequences of the 
second ) library and at least > one lof :the - ref erence , secfuences . 
The ; further i identified sequence values are processed to 
10 generate vfurther final . data ^values .indicative; :of the number 
of c times each further identified i sequence i value t is present 
in the second ^library. .The final idata. values from the c 
first specimen and ithe? further identified sequence; values 
f rom. ;the . second; specimen are processed /bq. generate iratios- : 
15 of transcript sequences, which indicate the differences in 
the number of gene transcripts between the . two vspecimens .,r ; 

. In a further; embodiment/ the method includes- : . i 
quantifying the relative abundance of . mRNA in a biological 
specimen by (a) isolating a population )Of mRNA transcripts 
20 from a biological specimen; (b) . identifying genes cf rom 
which ^ the mRNA was .transcribed - by a sequence-specifdc 
method; (c)\ determining the numbers of mRNA transcripts^ - 
corresponding to each of the genes;: and ,(d) using the mRNA - 
transcript numbers to determine: the relative ; abundance of 
25 mRNA :transcripts within the population > of mRNA transcripts.: 
Also disclosed is a method of producing .a jgene i 
transcript image analysis by first obtaining, a mixture of 
mRNA, from which cDNA copies are made. The cDNA is 
inserted into a suitable vector which is used to transfect 
30 suitable host strain cells which are plated out and 

permitted to grow into clones, each cone representing a 
unique mRNA. A representative population of clones 
transfected with cDNA is isolated. Each clone in the 
population is identified by a sequence-specific method 
35 which identifies the gene from which the unique mRNA was 
transcribed. The number of times each gene is identified 
to a clone is determined to evaluate gene transcript 
abundance. The genes and their abundances are listed in 
order of abundance to produce a gene transcript image. 
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i In a further * embodiment > ! the relative *rabundance< of the 
gene ^ transcripts • In < one cell . type or tissuevt^is, compared r, 
with the relative abundance of gene transcript-.nximbefrsrcin a 
second ceil^ttype /or: tissue i>n ;Qrder\to ddenti-fy^ the;c:r. 
51 differences'-'Tahd- ^similarities cr:" i.-^- .u;-*- inton by 

,r oivin a further 1 embodiment, the ^method includes .a.isystein 
for, analyzing. a library of : ^biological sequences cincluding a 
means: "f or ^ receiving a set of transcript sequences, where 
each of the transcript ^sequences, is , indicative of ,a) r , r, ; 

10 diff erent one of the! JDiological :sequences of -the .lite 

and ^ a means f or ^processing the transcript sequences :;in a 
computer system which ^ai fdatabasenof refere^ice transcript 
sequences indicative of reference ^sequences ds stored, 
wherein .the:.>computer ds programmed with software ; for- ; j j v 

15 generatingA'an identifiedvsequence rvalue for leach^pf uthe r 
transcript sequences, fWhere eachcsaid identified sequence- 
value is. indicative iot a sequenceTi anno tat iQn ^^aijd the degree 
of match between a different one of the biplogicali r-i ^. 
sequences of the library and vat least rOne of ;the refexjence 

20 sequences, .and rf or processing each said -identified sequence 
value to generate final data, values indicative vof ther,, 
number of times each identified sequence value is present 
in the ^libraary, • -^^ "* ;^^/ l'v. — 

In essence, the invention is a method and. system if or n 

25 quantifying the relative abundance of "gene transcripts in a 
biological specimen. The inventitjn provides . a method for 
comparing the gene transcript ' image from two or more . 
different biological specimens in-order to ^distinguish 
between the two specimens and identify one or more genes 

30 which ^are differentially expressed between the two 
specimens. Thus, this gene transcript image and its 
comparison can be used as a diagnostic. One embodiment of 
the method generates high-throughput sequence-specific 
analysis of multiple RNAs or their corresponding. cDNAs: a 

35 gene transcript image. Another embodiment of the method 

produces the gene transcript imaging analysis by the use of 
high-throughput cDNA sequence analysis. In addition, two 
of more gene transcript images can be compared and used to 
detect or diagnose a particular biological state, disease. 
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^br condition. which; Is correlated to the relative abundance 
-Of gene itranscripts :.in ;a ^given icell.'or: population: of : cells. 

. 4,,.!).. PESCRIPTION. OF THE TABLES AND DRAWINGS . 

ic.>..:iptf^n. tables .^ovi, 

5 r*':: ^ Table^i presents sa. detailed explanation of the letter 
'codes^;uti=li2ed In Teibles /2^5.r. i.- , •;. r, = = .,. 

c. / Table 2 lists ' thei one hundred- most common .gene .. 
■transeripts.ol itvis a partial; aist.of. . isolates.! from the ij.. 
:iu :HUVEG.iCDNA.. library .prepared- and sequenced; as described v^r 
10 l>elow.. . ^The ieftrhand-columnj refers /to the, sequence's order 
of: abundance, -in: this table-t . The next column?. labeled 
!'';number.V...i&.the. clone, number of the first. HUVEC sequence 
ddentif-ication reference matching . thie sequence in the^ -f 
1" ("entry" column number. Isolates that have not been 
15 sequenced are . not present ; in Tablei 24 The ; next • columni, 

labeled "N". I indicatfes;. thei total number of - cdnas . which' have 
;the same degree; of match .with the sequence .of ..the reference 
transcript in the "entry" column. .., ... s . • , 

aJhe column labeled. 'lentryv gives the NIH GENBANK locus 
2 0 name , which corresponds to the 1 ibrary . sequence numbers . . ■ = 
The "s" column indicates in a few cases, the species of the 
reference sequence. The code for column "s" is given in 
Table 1. .The: column labeled "descriptor;' provides a- plain 
' ; English^ explanation of the identity of the sequence 
25 corresponding to the NIH GENBANK locus name in -the "entry" 
column, r..- ,. 

yal?le 3 is a comparison of the top fifteen most 
abundant gene transcripts in normal monocytes and activated 
macrophage cells. 

30 Table 4 is a detailed summary of library subtraction 

analysis summary comparing the THP-l and human macrophage 
cDNA sequences. In Table 4, the same code as in Table 2 is 
used. Additional columns are for "bgfreq" (abundance 
number in the subtractant library) , "rfend" (abundance 

35 number in the target library) and "ratio" (the target 
abundance number divided by the subtractant abundance 
number) . As is clear from perusal of the table, when the 
abundance number in the subtractant library is "O", the 
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target abundance f number is divided by - 0 i 05 . ^ This ■ is a way 
of ' obtaining a result (not possible dividing- by "0) and- y 
distinguishing the result'^ from ratios of 'subtractant ^ ' 
ntunbers ^df-*'!:''' - - i- ' ' t--^ . ^i. - m\ - 

5 ^ ^ ^ -^ Table - 5 is^ the> cbihputer > pr^ogram , Written in source 
code; ^or* generating gene transcript - subtractioti profiles . 

- rra'ble- 6 ^ is- a partial listing of database entries ^'used 
in'^the ^electronic northern blot analysis "as provided ^ by the 
present invention.''-^' ^•'^"^ 3<M^->.".'n- \ r,- :.-.h.,iq. 

- ' • ^ 4 . 2 . BRIET DESCRIPTION OF THE DRAWINGS 

^ Figure 1 is a chart -summarizing data collected and 
stored regarding ^"&he "library construction portion of '^^ 
sequence 'preparation and analysis J - 

15 " Ficmre 2 ^ is a diagirslm irepresehting the sequence of ^ 
operations i performed by "abundance sort" software in a 
class df pireferred embodiments 'Of the ^ inventive method. 

Ficmre 3 is^ a block diagram of a preferred = embodiment 
of the ^system of the invention. - 

20 Ficmre 4 'is a^more ^detailed block diagram of the 

bioinf brmatics process from hew sequence (that has 'already 
been sequenced but not identified) to printout of the 
transcript imaging analysis and the provision of database 
subscriptions; . ^ - . . 

25 5. DETAILED DESCRIPTION OF THE INVENTION \ 

The present invention provides a method to compare the 
relative abundance of gen4 transcripts in different 
biological specimens by the use of high-throughput 
sequence-specific analysis of individual RNAs or their 

30 corresponding cDNAs (or alternatively, of data representing 
cither biological sequences).. This process is denoted 
herein as gene transcript imaging. The quantitative 
analysis of the relative abundance for a set of gene 
transcripts is denoted herein as "gene transcript image 

35 analysis" or "gene transcript frequency analysis". The 
present invention allows one to obtain a profile for gene 
transcription in any given population of cells or tissue 
from any type of organism. The invention can be applied to 
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obtain a prof ile of a ^specimen consisting of a single: cell 
(or clones .of avsingle ^cell.) ,.. =or :of many t:elisv <.or.,vof ; , 
, tissue ,jBore .-complex than- a ^single, cell, and containing; 
rinultiple: cell.,typesv such: as liver, .t..,.. --.. .r iwii. 
;-5 A - ■ A.ri The .invention) has -significant -advantages in the. fields 
«of diagnostics^ toxicology 'and- pharmacologyJ, to >name^aifew. 
A .highly . sophisticated- ^diagnostic.. test caa be performed on 
ither.ill patient, in: whom a:!diagnosis-(haa.;not been made. ..Ar 
ibiplogical rspecimen^^onsisting .of the :patient/.s fluids or 
AO tissues . is ^obtained-;:,..and .. the »gehe transcripts- are isolated 
and expanded to >the extent , necessary, to., determine, their 
'identity. :.;Optionally,,,the gene- transcripts- .;cami)e. v ... , 
-converted to .«DNA*. ...... -A .sampling; of . the gene .transcripts are 

subjected to sequence-specific, analysis and quantlfiedvr - 
:*5 These gene transcript sequence abundances: are.: compared ^ c. 
iagainst reference database .sequence abundances- including ., 
.normal.data sets for .diseased and healthy, patients.. .The:,., 
.patient has, the disease (^). , with .which the patient/ data 
set most closely correlates. i..,, 
20 .For example, gene transcript frequency .analysis can be 

used to differentiate normal cells or tissues from diseased 
cells or tissues, just =as it^highlights differences.. between 
normal, monocytes and activated macrophages in; Table .3. 

m toxicology, a .fundamental question, is. which, tests 
25 are^ most .effective in predicting: or detecting a . toxic 

effect.. Gene transcript imaging provides highly detailed 
information on the cell and tissue, environment-, some of 
which would not be obvious in conventional, less detailed 
screening methods. The gene transcript image is a more 
30 powerful method to, predict -drug toxicity and efficacy. 
Similar benefits accrue in the use of this tool in 
pharmacology. The gene transcript image can be used 
selectivelyto look at protein categories which are 
expected to be affected, for example, enzymes which 
35 detoxify toxins. 

In an alternative embodiment, comparative gene 
transcript frequency analysis is used to differentiate 
between cancer cells which respond to anti-cancer agents 
and those which do not respond. Examples of anti-cancer 
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agents are tamoxifen^ vincriistine/ vinblastine, * c 

podophyllbtoxins , - etoposide / tenisposide / ^cispiatin', = • - ' ^ ^ 
biologic" response- modif iers such las interf eron; 11-2 / bM- 
CSF> enzymes, hormones and the like:/ * This method also 
5 provides a m'^ans' for sortiirtg .'the gene transcripts' by 

functional category. = i:n' jthe ckse of 'cancer- cells; • o^r 
^ranscription factors or other essential regulatory ' 
molecules are very important categories to analyze across 
different libraries. 
10 In yet another- embodiment./ Jcomparatfivev gene transcript 

J frequency analysis Is used to* differentiate between control 
liver, cells and' livers. cells isolated > from patients -itreated 
with experiment ail druigs like FIAU to distinguish between 
pathology caused by ^ the* lander lying ^diisease arid that caused 
15 by the 'drugi^v'. -i.-- : ^ '^^*?^. v ^ ^-.t. :cj:iy 

^ In yet another embodiment, comparative gene transcript 
frequency anaily sis is used to differentiate between brain 
tissue from patients treated arid untreated with lithium. 
In a further embodiment, comparative gene transcript 
20 frequency analysis is used to differentiate between- '^ 
cyclosporin and ,FK5 06 -treated cells and normal cells. - 

In a further embodiment,^ comparative gene transcript 
frequency analysis is -used to differentiate between virally 
infected (including HIV-infected) Human cells and - 
25 xininfected human cells/ Gene transcript frequency analysis 
is also used to rapidly survey gene transcripts in HIV- 
resistant, HIV-infected, and HIV-sensitive cells. 
Comparison of gene transcript abundance will indicate the 
success of treatment^ and/or new avenues to study. 
30 In a further embodiment, comparative gene transcript 

frequency analysis is used to differentiate between 
bronchial lavage fluids from healthy and unhealthy patients 
with a variety of ailments. 

In a further embodiment, comparative gene transcript 
35 frequency analysis is used to differentiate between cell, 
plant, microbial and animal mutants and wild-type species. 
In addition, the transcript abundance program is adapted to 
permit the scientist to evaluate the transcription of one 
gene in many different tissues. Such comparisons could 
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identify deletion mutantsr,which do' not- produce argene 
product andvpoint autants which- .produce , a 'less oalsundant ,_or 
otherwise^different messagei Such- mutations icani.af feet 

j.'basic-bdochemical , and. pharmacological- 'processes, such as 
= 5 mineral ^nutrition and " metabolism,, yahdiican be visoDtated by 
means-- known 'to those^ skilled = in the cart. • i Thus crops ;with 

! improved ^ yields , - pest resistance and mother factors . can 'be 

.^developed;: l-0<^. 'V,-i v.;.Un. ^: .... ..f-.^. K.j v,, ..p;. 

' In', a further embodiment, • cbmparativei gene ' transcript 
10 frequency analysis is -used; for ah , interspecies .^comparative 
-'analysis- which wouidraldow. for ; the selection .of betters 
' pharmacologic, animal. .models. . In- thisDembodimenty ^humans 
: and other eihimals. :;(.such . as ^a mouse)r, ■ ^or theirrcuitured i 
cells are .treated'^with a^ specif ic' test i agent. The relative 
• 15 • sequence , abundance.! of ^ each) cDNA population; is determined;; 
• If the animal' test system- is a . good model^, ihomologous genes 
in this ianimal cDNA population should;.change:'e>cpression.-«& 
similarly to those, in human cells, if side effects are 
detected with the<. drug, a detailed transcript uabundance 
20 ianalyslswillrbe performed to survey gene -transcript 

changes . Models will then be. evaluated , by =, comparing basic 
physiological, changes. .i . . . j 

In a further embodiment, comparative gene transcript 
frequency analysis is used in a, .clinical setting to give a 
25 highly detailed; gene transcript profile of a patient's 
cells.or tissue (for example*, a blood sample) . in 
particular, gene transcript frequency analysis is used to 
give a high resolution gene expression profile of a 
diseased state or. condition. , 
30 In the preferred embodiment, the method utilizes 

high-throughput cDNA sequencing to identify specific 
transcripts of interest. The generated cDNA and deduced 
..amino acid sequences are then extensively compared with 
GENBANK and other sequence data banks as described below. 
35 The method offers several advantages over current protein 
discovery by two-dimensional gel methods which try to 
identify individual proteins involved in a particular 
biological effect. Here, detailed comparisons of profiles 
of activated and inactive cells reveal numerous changes in 
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the .e>qpression iof individual itranscriptsv ^*Af ter it is 
deteiiained if the sequence : is <an "exact" match, similar or 
a non-^match/ the sequence is entered ^into a datdbasei ' - 
Next, the numbers of Copies of cDI?A corresponding *to 'each 
5 gene are ctabuiated.- Although ^this can^be done slowly and 
arduouslyv^ ^if at' all, ^by human hand from a printout of lall 
entries^' a computer .program is a useful cand rapid way to 
tabulate -this iinf ormation. ; The numbers of cDNA copies = v . 
(optionally^ divided by* the total number of is equences in-the 

10 data set) provides a^picture of the relative abundance af 
transcripts for each corresponding". gene. '-The list of 
represented genes ^can ^ then :be sorted by ^abundance in the 
cDNA'popoilation.j^ A multitude of additional types tof 
comparisons or dimensions are possible tahd are -exemplif ied 

15' below.^ :.v::. - vr^ L.r ' ; i" O-lri.c^-.r ^-.y-f m s^f 

"An alternate method-df -producing: la gene ^transcript 
image includes the steps of obtaining a mixtxire of test . > 
mRNA and providing -a representative array ^of mnique probes 
whose sequences are complementary. to at least some of the 

20 test mRNAs, Next, a fixed amount of the test mRNA is added 
to the arrayed probes. The test mRNA is incubated with the 
probes for a suf f icient :time to allow hybrids of the test 
mRNA and probes to form. .The mRNA'-probe hybrids are 

detected and the -quantity determined; - ;The hybrids are 
25 identified by their location in the probe array. The 
quantity of each hybrid is summed to give ^ a population 
niimber. Each hybrid ;:quantity Is divided by the population 
number to provide a set of relative abundance data termed a 
gene transcript image analysis. 

30 6. EXAMPLES 

The examples below are provided to illustrate the 
subject invention. These examples are provided by way of 
illustration and are not included for the purpose of 
limiting the invention. 

35 6.1. TISSUE SOURCES AND CELL LINES 

For analysis with the computer program claimed herein, 
biological sequences can be obtained from virtually any 
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• source. v>;Most, popular .are:;tissues .obtained -^roin the human 
c;bpdy . < Tissues can be obtained .from . any,, organ of .the. body , 
, any . age , donor , ,any.. abnormal ity .,Oir any ; immortal ized r cell 
pline.v..:^l5Pprtal...ceil lines ©ay be .preferred . in. .some.< 
-5 .instances because .of ^their purity of. cell . type; other 
ctissue samples invar iably.dnclude' mixed cell types.- , A ; 
.:special -technique is available to take a single cell (for 
example,. ^ bra^in cell), land . harness the . cellular machinery 
^tQ.,grow : up sufficient ,.gDNA for sequencing ^by vthe ttephniques 

.10 oand analysis, .described. herein (<cf. U.s, Patent >Nosv 

5>021,335,and 5,163, 038, , which .are incorporated by.,; ;. . = 
= reference), The examples., given /herein .^utilized . the r ; . . 
following , immortalized: cell^lines: tmonocyteTlike; U^937 
. cells, ..activated ,macrophage-like .THP-l cells;, r induoednh . ,n>; 

,15 vascular endothelial, qells vCHUVEC cells) and.,mast cell-like 
HMC^l- cells. .„...:.-.....•■.. . , 

, ; .The U-?3.7 cell line is a human histiocytic lymphoma 
cell line with monocyte characteristics.,.. established -from , 
malignant cells obtained from the pleural effusion of a 

v20 patient with diffuse histiocytic lymphoma (Sundstrom, C. 
cand Nilsson, K. (1976) Int. J., cancer 17:565) .> u-937 is 
one of only. a few human cell lines with the morphology,, 
.cytochemistry, surface receptors and monocyte- like 
characteristics of -histiocytic cells. These cells can be 

:25 .induced . to terminal monocytic differentiation and, will, ; 
express, new cell surface molecules when activated with 
supernatants from human mixed lymphocyte cultures. Upon 
this type of An vitro activation, the cells undergo 
morphological and .functional changes, including 

30 augmentation of antibody-dependent cellular cytotoxicity 

(ADCC) against erythroid and tumor target cells (one of the 
principal functions of macrophages). Activation of U-937 
.cells with.phorbol 12-myristate 13-acetate (PMA) in vitro 
stimulates the .production of several compounds, including 

35 prostaglandins, leukotrienes and platelet-activating factor 
(PAF) , which are potent inflammatory mediators. Thus, u- 
937 is a cell line that is well suited for the 
identification and isolation of gene transcripts associated 
with normal monocytes. 
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The HUVEC cell line is norinal'/^tooindgeTieous, well 
charadterizea, early passage erfdothelialf cell culture from 
human umbirical vein (Cell Systems . Corp, , ' r2 815^ 
Street V Kirkland , - *WA ' 98 03 4 ) ■- Only gene transcripts from 
5 induced / or* treated / jrHUVEG cells ' were -sequenced On© batch 
of il X'XO^ cells-was treated for 5-^ hours with r U/ml rlL^lb 
and vlOO' rig/ml - E,coli iipopolysaccharide * '(LPS) endotoxin 
prior ^to harvesting^ A separate batch of 2 X 10*-*cells was 
treated at confluence with 4 U/ml TNF and 2 U/^1 

10 interferdn-gamma * CIFNrgainma) prior tb harvesting i-p > ^ ^ 
^' TTHP-l is a 'human leukemic cell line With distinct 
iSondcytic ch'aracteristicjs.'^ ^his c^ll line} was ^derived -from 
the blood -of a"" 1-year-old boy with ^acute monocytic leukemia 
(Tsuchlya; S. 'et al: (19^8t)) Ifit. J; Cancer: '171-76) . ^ The 

±5 following ' cytologicaT and cytochemiGai criteria were ' used 
to determine the monbcytic nature of the cell line: 1) the 
presence bf -^Ipha'r-naphthyl butyrate estiefase activity ^whibh 
could be inhibited by ^sodiiSm fluoride; 2) -the production of 
lysozyme; 3) ^the^ phagocytosis of latex particles and 0 

20 sensitized SRBC ' (sheep red blood ^ cells) ; iahd 4) the^ ability 
of mitomycin -C-treated' THP-1 ;cells to' activate *T- ^ • ^' 
lymphocytes following ConA (concanavalin A) treatment. 
Morphologicatlly, the cytoplasm contained small ^azurophilic 
granules and the ^nucleus was indented and irregularly 

25 shaped with deep folds. The-cell line- had Fc and C3b 
receptors, probably functioning in phagocytosis. THP-1 
cells treated with the tumor promoter 12-o-tetradecanoyl- 
phorbbl-I3 acetate (TPA) stop proliferating and 
differentiate into macrophage- like cells which mimic native 

30 monocyte-derived macrophages in several respects. 

Morphologically, as the cells change shape, the nucleus 
becomes more irregular and additional phagocytic vacuoles 
appear in the cytoplasm. The differentiated THP-1 cells 
also exhibit an increased adherence to tissue culture 

35 plastic. 

HMC-1 cells (a human mast cell line) were established 
from the peripheral blood of a Mayo Clinic patient with 
mast cell leukemia (Leukemia Res. (1988) 12:345-55). The 
cultured cells looked similar to immature cloned murine 
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mast cells, contained histainine/ and stained 'positively for 
chloroacetate .esterase V amino caproate esterase?, /eosinophil 
major; basic !protein;:(MBP) cand tryptase>» c The -HMChlr.cells 
:have, chowever,o "lost -the ability to. synthesizei normal IlgE 
y5 ^receptors. HMG-:l ceiafs also possess a 10; 16 translocation, 
present in ;cell& initxaliy colirected by letikophoresis from 
the ^pa:tient and not an^>artifact of tculturing.r Thus,: HMC-l 
ccells ^are *a .gobdi .model-\f or mast. .cells it. v -^wr't^^i: 1 := 

i ; . V * ; £ . Z ^ CONSTRUCTION OF ePNA I^TBRARIES - r J 

10 : ■ vFor inter-rribrary comparisons , the libraries; must= be 
prepared cin ^sim^ilar ^^nners J >Gertain>parameters' appear i^to 
be particularly important to control. One such parameter 
is the method of isolating mRNA; It as. importiant to use 
the ^same .conditions to r-emove JDNA and heterogeneous nuclear 

15 RNA f rom coinparison libraries . c size fractionation of ScDNA 
must be carefully, controlled . The isame Lvector preferably 
should be used^ f or preparing /libraries to be compared, u At 
the very least, the same type of vector (e;g.., . c ' 

unidirectional vector); should be used to assure a valid 

20 comparison, A unidirectional vector may be preferred In 
order^ to more eaisily analyze *the/output. : --.j - 

• It is preferred to prime only with oligo dT 
unidirectional ^primervin rorder to obtain one only clone per 
mRNA transcript when bbtai-ning cDNAs. -However, it is^. 

25 .recognized that employing ;a mixture of oligo dT and random 
primers can also be advantageous because such a mixture 
results in more sequence diversity when gene discovery also 
is a goal. , Similar effects can be obtained with DR2 
(Clontech) and HXLOX (US Biochemical) and also vectors from 

30 Invitrogen and Novagen. These vectors have two 

requirements. First, there must be primer sites for 
commercially available primers such as T3 or M13 reverse 
primers. Second, the vector must accept inserts up to 10 
kB. 

35 . . It also is important that the clones be randomly 
sampled, and that a significant population of clones is 
used. Data have been generated with 5,000 clones; however, 
if very rare genes are to be obtained and/or their relative 
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abxuidance > dettermlned , as many < as 100>000^. clones from a vu Vi 
single library i may need :to be sampled. ^ ; Size- fractionation 
of ' »cDNA - also must : be^ caref uily /controlledi ^ . Alternatelyy : 
plagues /can be selected V rather? Jthari^clonesi. - -^i ^nr^rnu 

5i 1 i Vi Besides r the Uni-ZAP™ vector system by Stratagene 
disclosed below it is now believed that other. similarly 
unidirectional . vectors also ican be used; : For example, .itr 
is> believed that such^ vectors, include but) arei not: limited 
to DR2 (Clontech)^ and HKLOX (U.S.; Biochemical) . ^ 

10 f ca Preferably, the details of library construction (as 
shown in -Figure .1) ;areucdllected and stored in^. a database 
for aater retrieval ^relative to the sequences being -. , u 
compared; . Fig. 1 ' shows . important ^ inf ormationnregfeirddng. the 
library collaborator or cell or: cDNA^ supplier , 

15 pretreatanent^ i biolpgical source V culture, mRNA:* preparation 
• and cDNA construction. Similarly! detailed information 
about the other steps is rbenef icial in analyzing = sequencesa 
and libraries in depthi^ < , ^ ^ ; ^r^- ..■ ; ■ ; 

RNA must.be harvested from cells and tissue. samples 

20 and ' cDNA \libraries are subsequently constructed. cDNA 

libraries -can be constructedf according to techniques known 
in the. art. r (See, for example, Maniat is, T.: et al. (1982) 
Molecular Glonihg; Cold Spring Harbor Laboratory, New 
York) cDNAt. libraries .may alsOi.be: purchased. The U-937 

25 CDNA library (catalog:^No;. 937207): w^is^ obtained from 

Stratagene, Inc., 11099 M.-Torrey Pines Rd. , La Jolla, CA 
92037. r . , 

The THP-1 CDNA' library was custom constructed by 
Stratagene from THP-l' cells cultured 48 hours with 100 nm 

30- TPA and 4 hours with 1 fig/ml LPS. The human mast cell HMC- 
1 CDNA library was also custom constructed by Stratagene 
from cultured HMC-l cells. The HUVEC cDNA library was 
custom constructed by Stratagene from two batches of 
induced HUVEC cells which were separately processed. 

35 Essentially, all the libraries were prepared in the 

same manner. First, poly(A+)RNA (mRNA) was purified. For 
the U-937 and HMC-1 RNA, cDNA synthesis was only primed 
with oligo dT. For the THP-l and HUVEC RNA, cDNA synthesis 
was primed separately with both oligo dT and random 
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hexamersy and the two-cDNA 51 ibrar-ies -were -treated ■ 
separately, ^synthetic. adaptor oligonucleotides were 
ligated onto cDNA .ends .enabling its- insertion .into, the, ,Uni- 
Zap". vector systemuistratagene), .allowing; high efficiency o: 
5. unidirectional ; (sense. or:ientation^,aambda- library. .: .... • 

construction and *he convenience, of .a ptlasmid. system^ with,: 
blue-.white .color ^selection to . detect'vclones . with cDNA .; . ..v 
insertions..,. Finally,, the .two -libraries . were combined into 
a.,si.ngle library -byi mixing, equal nun»bers of bacteriophage. 

The libraries can- , be screened with either .DNA probes 
or:.,antibody. probes, and the pB;uescript®:.phageinidi . . , 
CStratagene), can :be rapidly ■ excised in. v« : ... n^xo phagemid 
allows the ,use of a- plasmid. system for. easy. insert...;. ■ .... 
characterization, ^sequencing, Kslte-directed mutagenesis, 
the, creation of .unidirectional de let ions r and^ expression of 
fusion proteins . . The .custom^constructed library^ phage = r . 
particles were infected Into-E. co^j host =strain.XLi-.Blue*.. 
(Stratagene),. ..which has a;.high. transformation efficiency . 
increasing- the probability of obtaining rare., -under, r, . 
20 represented clones inr the cDNA library.. . 

*'3- ISOLATIOW OP enya 
The phagemid forms of- individual cDNA clones were 
obtained by the iii.j£4vs excision process, in which- the ^ host . 
' bacterial strain was. coinf ected with both . the lambda 
25 library, phage and an fl helper phage. - Proteins .derived 

from both, the library-containing phage, and the helper phage 
nxcked. the. lambda DNA, initiated new DNA synthesis from - 
defined sequences -on. the lambda ; target DNA and created a 
smaller, single stranded circular phagemid DNA molecule 
30 that Included all DNA sequences of the pBluescript® plasmid 
and the cDNA insert. The phagemid DNA was secreted from 
the cells and purified, then used to re- infect, fresh host 
cells-, where the double stranded phagemid DNA was produced 
Because the phagemid carries the gene for beta- lactamase 
J5 the newly-transformed bacteria are selected on medium 
containing ampicillin. 

Phagemid DNA was purified using the Magic Minipreps™ 
DNA Purification System (Promega catalogue .#A7 100. Promega 

20 
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Corp. , 2800 Woods Hollow Rd;, Madison, Wl. 53711) \ This 
sihali-scale -proce^s provides a-» simple- and ^reliable method 
for lysihg the bact^rial^cells and rapidly isolating' c 
purified phagemid'^^DNA using a= proprietary DNA^binding ' 
5 ' resin . The ^DNA-Vas - ^ fluted* from the pur if ibation -riesin^^l 
already prepared' for DNA sequencing and other analytical^iw 
mahipulaitioriisi'--^' --a* ^. r-.:; r., / .iqt^ r^..: \: h 

Phagemid' DNA was also purified' using the QIAwellr-S - = 
Plasmid 'Purification^ Systiem from QIAGEN® DNA' Purification 

10* System (QIAGEN ^Irici , ' 9259 Eton Ave . , ' Chattsworth , C^^^ 

91311) • ' This product line provides^ a convenient , . rapid ^ and 
reliable' high- throughput method for lysing the bacterial 
cells and^ isolating highly purified phagemid I MIA: using 
QIAGEN anibn-exchange^ resin particles with EMPORif™ membrane 

15 technology from 3M in a multiwell format. iThe^DNA was 

eluted f rom the purification resin already prepared for DNA 
sequencing and other analytical manipulations. '^h: 

An alternate method- of purifying phagemid has recently 
become available. It utilizes the Miniprep Kit (Catalog 

20 No. 77468; availkhle from Advanced Genetic Techno loigies 
Corp., 19212 Orbit Drive, Gaithersburg, Maryland).. This* 
kit is 'in the 96-well format and provides enough reagents 
for 960 purifications. Each kit is provided with a 
recommended protocol; which has been employed except for 

25 the foiidwing changes. First, the 96 wells are each filled 
with only 1 ml of sterile terrific broth with carbenicillin 
at 25 mg/L and glycerol at 0.4%. After the wells are 
inbculated; the bacteria are cultured for 24 hours and 
lysed with 60 /il of lysis buffer. A centrifugation step 

30 (2900 rpm for 5 minutes) is performed before the contents 
of the block are added to the primary filter plate. The 
optional step of adding isopropanol to TRIS buffer is not 
routinely performed. After the last step in the protocol, 
samples are transferred to a Beckman 96-well block for 

35 storage. 

Another hew DNA purification system is the WIZARD™ 
product line which is available from Promega (catalog No. 
A7071) and may be adaptable to the 96-well format. 
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' • ' SEOPENCI NG. OP. 'eDNA .ci'OWIgRV ^ .? 

,,Th? oDNA inserts 'frQni,.ranaom.:isolates^f.i.t:he.U-937r.and 
THP-lalibrarieSi«ere.:sequenced/.inr.part..i,. Methods for DNA ' 
sequencing are .well known in the .art. . : Conventional 
5 enzymatic . methods. -employ DNA^^polymerase . Klenow.- fragment, r.r 
Sequenase»vorfTaq^polymerase;to.,extehd^DNA^chains from ari^ 
oligonucleotide; primer annealed ito.the J DNA template of 
interest. ..Methods , have been developed ^f or. -the^use. of .both 
single- and? double-stranded teraplates.u. The =chain ... 
termination reaction:: products .are f usually electrophoresed-- 
on urearacrylamide gels rand are ^detected either by c^x ■ :• : . ... . 

autoradiqgraphy.:(for,:radionuolide-labeiediprecursors^..oriby 
fluorescence, (for fluorescent-labeled precursors). ..Recent 
improvements in mechanized .reaction preparation;- sequencing 
and analysis-using the fluorescent detectionvmethod have-: 
permitted expansion Mn; the number of sequences that can be 
determined per . day .{such : as the Applied^ ^ Biosystems. .373 and 
377 DNA sequencer, catalyst 800|:;....currently with the 
system as. described, read lengths range from .250^.to 400 
bases and . are clone dependent., . Read: length also varies , 
with = the length of time the -gel. is .run., . In general, the 
Shorter runs .tend to truncate ^ the .sequence. A minimum of 
only about 25 .to 50 bases is- necessary to establish the r, 
Identification and degree of homology of the sequence. 
Gene transcript dmaging . can -be used with iany sequence-, 
specif icMmethod, including, but not limited. to- 
hybridization, mass spectroscopy, capillary electrophoresis 
and 505 gel electrophoresis. , 

3n ^•®* =°**OLOGY SEARCHING OP cDNA CLONE AND 

DEDUCED PROTFTN fan^ , 

Using the nucleotide sequences derived from the cDNA 

clones as query sequences (sequences of a Sequence 

Listing) , databases containing previously identified 

sequences are searched for areas of homology (similarity) . 

Examples of such databases include Genbank and EMBL. We 

next describe examples of two homology search algorithms 

that can be used, and then describe the subsequent 

computer-implemented steps to be performed in accordance 

with preferred embodiments of the invention. 
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In the, fallowing (description of ther pomputerr. t 
implemented steps* of othe invention the. .word " library V, 
denotes set (or population) of ^biological specimens ; ; j 
nucleic acid sequences . " library" can. consist; of / cDNA. i 
5^ sequences p .jRNA i sequences, bor -ther like, , which characterize a 
biological ijspeciinen^ The biolpgical^,spec4^len can consist 
of ccells tof a single human.i cell type - or,: can be any of the 
other above*?:nientioned types \ of specimens), ^ We^ contemplate 
that the sequences, in:. a library havevbeen detezmineidt so as 
10 ^ to accurately represent; >or characterize a) biological^; ^ n 
specimen t f or example, they . can consistoof .representative 
cDNA .sequenceSv-.from clones of RNA taken from a single human 

cell) ' ^ . : . : , r ,u 

In (the: following -descriptionr of the. computer-; , . ^> ; . 

15 implemented .steps of the invention, the . eifpr^ssion 

"database'/ -denotes a. setoiof stored] data- which represent, ^a 
collection of sequences , -which in^ turn, represent, a 
collection, of biological reference vmaterials,, For example, 
a database can consist of data representing many stored : x 

20 cDNA sequences which. J are in turn representative of human 
cells infected with: various viruses>. cells of humans of 
various ages, cells from different mammalian species, and 
so- on* SI ' ' ^ ' • ^ ' : - ? ; ; 

. In . preferred embodiments , • the . invention employs a 

25 computer programmed with software, (to be described) for 
performing: the following steps.:, 

(a) processing data indicative of a library of cDNA 
sequences (generated as a result of high-throughput cDNA 
sequencing or other method) to .determine whether each 

30 sequence in the library matches a DNA sequence of a 

reference database of DNA sequences (and if so, identifying 
the reference database entry which matches the sequence and 
indicating the degree of match between the reference 
sequence and the library sequence) and assigning an 

35 identified sequence value based on the sequence annotation 
and degree of match to each of the sequences in the 
library; 

(b) for some or all entries of the database, 
tabulating the number of matching identified sequence 



23 



WO95/20CT1 

PCT/DS95/01160 

values : in.the library (Al1:hough. this- can.b^.tdone.by .„uaa„ 
h^nd f ro„ . a .printout .of .all entries , .«evpre£er sp^torm 
this, step usinghcomputer; software -^to .be described below ) 
thereby generating a .set rof .f:i„al.data.Values.or "abundance 
5 numbers"; .and-...-, yuu^z). ■■■ - -- ■ 



^v.>-,..(c) .if..the •litoraries/are^differentvsizes/.vdividing ,^^ 
^ch^. abundance nuinber-by rthectotal.nu»^er.6f.cse.juences .in 
idlntlTT' --^-iativ. .abundance , number cf or each 

10 (-^-.;.-a relativ^^abuhdance Of on.. 

10 each gene transcript)' . V ••■ M.;:G,'j.fG.' ve- ; .'-t- . - r. m r, r. 

c^^J^'V:^ Of .identified sequehce^^alueW >f6^ -ge^^^^ 
^^"^^^^^^^^-^^^^^ be sorted by abundance in 
the cDNA.population.. , A .multitude cf additiona;i..types.of 
comparisons or , dimensions iare.possible.. .:. - - 

15 ^^.^ ;t For. example ; {to be described rbelov. ,in.gr^ter.d^tail,Ir 
steps .(a) and ,b| can be repeated.for twd .different., ^-^t 
libraries .(sometimes .referred to as a "target" library and 
a ••subtractant" library) .Then k "^^^rary and 

j.j.Biraryj . , Then, for leach Lideritified>t t" 

sequence v.lue (or Jene. trensoript.) ; ■ a ..ratlCMvalue ,is . . ; ■ 
20 "btainea, ty -divlaing, the-, abundance „u.i,er .,w«: , that-., , - -^ 

adentifiea .se,uence value):.for the ..target ;iibrary,Uby . the 
abundance number (for that ,ldentif led sequence. vLe^ f or 

the subtractant library ; ^ .v , . , . 

^ . in. fact; subtracticn ,Bay be carried -cut cn teiltlple 
™n"b'"-*' Posslbxe tc add the transcripts. .re"'' ., 
aeveral axbraries (fcr .example, ^three),„a„d then tc. divide 
the» by anctber set , of transcripts fro» multiple UbrarL" 
(agaxn, for example, three, . . .Notation, for this: operation.^ ■ 
»ay^be. abbreviated, as „(A*«.o, / , , where the capital 

abl"° ^" °P"°-"y the 

abundance numbers of transcripts In the sutaed libraries 

»ay be , divided by the total sample sl.e before subtraction. 

a sl„!r "y'^'^i-'i^tion technology which permits 

a single subtracticn of two libraries, once one has 

" --^ stored 

them in the computer, any number of subtractions can be 

performed on the library. Por example, by this method, 

ratxo values can be obtained by dividing relative abundance 
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values in a first ilibrary by corresponditigr values in a 
iseconji library and vice i versa; »^ i it : ^ . ; ^ r ^ 

' < in variations on Jstep (a)-; the library consists df-'"^'^ 
^nucledtide 'sequences derived from CDNA clones. ' Examples of 
5 databases Which can ^be searched for areas^of homorbgy -j^' 
(similarity) ^in^ step r(a)^ include /the commercially available 
.databases known as vGenbank '<(NIH') ^EMBL (European Molecular 
Biology Labs/ Germanyr; and GENESEQ • f Intelligenetics, 
Mountain View,^ California) ^ r -.u-' / ; >: r . .. 

lO ^ :Orie homology sear c be used t^' 

implement step :'i(a) is the algorithm described in the paper 
by D . J:. - Lipman and. W . Pearson , entitled "Rapid and 
^Sensitive Protein^ Similarity Searches /" i Scierice . 227 : 1435 
(1985) V In this algorithm; the hbmiologous regions are 

\LB searched in va two-step, manner . 'In -the f irst step, the 

highest? homologous re^ons are determined by calculating a 
matching score ^Using a^ homology ^ score table. The parameter 
"Ktup"Ms used lin this step to establish the minimum window 
size to be shifted for comparing - two sequences . Ktup also 

20 sets the number of bases that must match to extract the 
:highest homologous -region among the sequences. In this 
step, no insertions or deletions are applied and the 
homology is displayed as an initial (INIT) value.* 

In the' second step, the homologous regions are aligned 

25 to obtain the highest matching score by inserting a gap in 
order to add a probable deleted portion. The matching 
score obtained in the first step is recalculated using the 
homology score Table and the insertion score Table to an 
optimized (OPT) value in the final output. 

30 DNA homologies between two sequences can be examined 

graphically using the Harr method of constructing dot 
matrix homology plots (Needleman, S.B. and Wunsch, C.O., 
Mom. Biol 48:443 (1970)). This method produces a 
two-dimensional plot which can be useful in determining 

35 regions of homology versus regions of repetition. 

However, in a class of preferred embodiments, step (a) 
is implemented by processing the library data in the 
commercially available computer program known as the 
INHERIT 670 Sequence Analysis System, available from 



25 



wo 95^20681 

^PCT/US9S/01160 

Applied Biosystems Inc. .(Roster -City,,. ^lalifornia) , 
.including ,the, software :known as ,^he,.Eact«r.a software (also 
avaxlable,troin.;Applied,.fiiosyste»s.-inc.j);.-:,The=.Fac^^^ 
>P.rpgrain ^reprocesses each library, sequence do : " edit out"- , 
• 5 .portions.thex:eof.-which ^re .not^aikely ..to be.iof interest 
■such -.as .the ivector.used .-to prepare .the. library ...Additional 
.sequences,.which .can ,be. edited out .or .asked.. (ignored by the 
search.tools, ^nclude .but are .not .limited, to- the polyA taU 
-nd..^epeUtiye :GAG..and.CCC .sequence,.,. A low.^^ LL, 
10 .program -can.be -written ..to.mask.out ^uch ••lov-inf ormation'C 
sequences,. .preprograms such as. BLAST .can Ignore' .the low-'' 

-information .'Sequences. 

: . in. the. algorithm implemented Iby ithe.MrNH^T . 670 r I 
sequence ^Analysis ..System,: .the , Pattern, ^Specification 
Language .(developed by TRW clnc.) Is ^used; -to determine^ 
regions of- homology. "There, are .three parameters that 
determine .hpw ^INHERIT-analysis .runs .sequence comparisons: 
wxndow Size., ..window offset .and^error toler^ance.-... window -, 
sxze .specif ies.the .length, of- the segments into .which the: 
.20 query sequence is. subdivided., .Window offset .specifies • r- 

iTth" b'T -^^i^^-^ oo^Jy. c:unting 

.from the beginning of the previous segment. Error 

-tolerance specif ies the totaLnumber of . insertions, . 

25 soel^r^r -bstitutions that are tolerated over the 

25 specified word length. Error tolerance may be set to any 
xnteger between, o .nd ... ..The default settings are wLo^ 

™r:"?' error ,tolerance=3.«. . 

. INHERIT An.ly ..> n^.,. ,1, pp.2-15. Version l.o. 

Applied Biosystems, Inc., October: .1.991. 

30 using a combination of these three parameters, a 

database (such as a DNA database) can be searched for 

sequences containing regions of homology and the 

appropriate, sequences , are scored with an initial value. 

subsequently, these homologous regions are examined using 

versu '° '^'^""^"^ °^ ^-olo^ 

can b "'rr °' --th-waterman alignments 

can be used to display the results of the homology search 
The INHERIT software can be executed by a Sun co^uteT 
system programmed with the UNIX operating system. 
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Search -^alteirnatiyeso to , INHERIT BLAST > 

program, ] GCG (javailable; f roTnv.the- GeneticsaCoinpute 
WI)^ :andTthe Dasher program (Temple] Smith, . Boston : i ii^^; . 
Unlvelrsityi Boston, MA) . . Nucleotidex sequences .can . be 
5 searched : against Gehbank; . fEMBL .ordcustom databases:^^ such as 
GENESEQ (availabler from IntelligenetiGs,- Mountain View, CA) 
ori Qthersdatabases, f orygenes. t in ^^ition, we have : ] 
searched some sequences; against > ou?iaOwn,>in-house,. database. 
: . ' j In preferred; embodiments, the, transcript j sequences are 

10 analyzed :i)y the/INHERIT softw^re-forr best, conformance with 
a reference igene -transcript to assign ^at. sequence .identifier 
and, assigned the degree. ofc homology which together ^re the 
identified sequence value, and are) input 3into>^ andn further . 
processed by/ra.^Macintosh .^personal computer available from 

15 Apple) vprogrammed with an ^abundance, sort and subtraction 
analysis" computer .program (to be- described below) . y 

f \ Prior to the abundance sort, and' subtraction analysis 
program (also - denoted as the "abundance sort^^. prpgram) , . 
identified. sequences if rom the cDNA. clones are assigned , ) 

20 value (according to. the parameters given above) by degree 
of vmatch ; according to. the* if ollowing categories : "exact" 
matches .(regions with a? high degreSeOf identity) , \ 
homologous human matches (regions, of high similarity, but 
hot "exact" matches) , homologous non-human matches (regions 

25 of high similarity present in species other than human)^. or 
non matches (no significant regions of • homology to 
previously identified nucleotide sequences stored in the 
form of the database). Alternately, the degree of match 
can be a numeric value as described below. 

30 With reference again to the step of identifying 

matches between reference sequences and database entries, 
protein and peptide sequences can be deduced from the 
nucleic acid sequences. Using the deduced polypeptide 
sequence, the match identification can be performed in a 

35 manner analogous to that done with cDNA sequences. A 

protein sequence is used as a query sequence and compared 
to the previously identified sequences contained in a 
database such as the Swiss/Prot, PIR and the NBRF Protein 
database to find homologous proteins. These proteins are 
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initiaaiy scorea^f or homology .;u8ing..a : hoioology . score .Table 
: ,(Orcutt.,.B.C. and Daypff , m.g. ... Scpring. Matrices, ,.PIR 
Report . MAT... .028-S.. (February, 1985) ) , resulting,,in...an.INIT 
T^or.e.,„,,The, honplogous -regions...are,aligned. to.-ot>tain ithe 
.5 '■l^ighest .»atching-:scores.;.b^^ 

:probable.deletedr.por,tion.,..The, matching score..is.,.. 
-recalculated, using the. homology: score-.Table . and .the 
oinsertion, score. Table resulting in- anhoptimized^^OPT) 
.score. ...Eyen.in.the absence,of .knowledge of Tthe .proper .v- 
.10 .reading frame of -an^isolated sequence. . the. above-described 
-protein homology^ search, may. be performed byrsearching ,all.3 
nreading,,frames..-v-. •,. .y.^i.-.. ■„..,. 

-^-^^Ptide-and. protein f sequence homologies :can 'also be 
.ascertained using the .lNHERiT..67.o. Sequence .Analysis ..^^^^^^ 
-15 - in an .analogous way tb.that.^used dn. DNA sequence.^ . - 
homologies... .Pattern .specification Language. .and .parameter 
windows, are . used , to- search: protein databases, - for. sequences 
containing , regions.of : homology, which are.scored with an 
initial value.,, subsequent display in a dot-matrix homology 
.20 .plot shows -regions of homology versus .regions > of , ... 

. repetition.. ; Additional search tools -that .ar^ available to 
use .>on., pattern search databases include PLsearch Blocks 
.(available f rom ! Henikof f & .Henikoff , University- of • 
Washington,. Seattle),, Dasher and .GCG. • Pattern- search- 
25 databases include, but are not. limited .to,.. Protein Blocks 
(available from Henikof f & Henikof f,. University ^of 
Washington, Seattle)-, Brookhaven Protein (available from 
the Brookhaven. National Laboratory,. Brookhaven, MA) , 
PROSITE, (available from Amos. Bairoch, -University of Geneva 
30 Switzerland), -ProDom (available from Temple Smith,. Boston ' 
University) , and PROTEIN MOTIF FINGERPRINT (available from 
University of Leeds, United Kingdom) . 

The ABI Assembler application software, part of the 
INHERIT DNA analysis system (available from Applied 
35 Biosystems, Inc. , Foster City, CA) , can be employed to 

create and manage sequence assembly projects by assembling 
data from selected -sequence fragments into a larger 
sequence. The Assembler software combines two advanced 
computer technologies which maximize the ability to 
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assemble sequenced DNA f ragmentswintot Assembl^^ a 
special ' grouping of data ^where the relationships between^f 
sequences * are shown • by ^ graphic overlap; alignment 'and * pi ^ 
statistical views • The process is based ^on. th<^^^ ^ ^ 
5 Meyers-Kececibglu model *of f ragment-asseiably^^ /(INHERITS ir - > 
Assembles? i User ^ sc Manual; Applied vBibsystemsy line. , .Foster/ ; 
City, Ch) ) J and uses graph theory -as the foundation' of a f 5 
very rigorous multiple sequence alignment engine^ fori i r. 
assembling ;DNA secpience fragments^. ! . Other > assembly > programs 

10 that can be used include rMEGALIGN ji(available>frbm . DNASTAR 
lnc.> Madison, 'WI), Dasher andiSTADEN (available fromi Roger 
Staden, Cambridge, England) vi . -( > i 

Next, with reference to Fig. 2, we describe in more 
detail >the "abundance sort." ' programpwhicho implemients labove- 

15 mentioned^"step<:(by " to tabulate the number, of sequences of 
i • the library which match each database entry i( the. « abundance 
number^ for each database entry) hi ^ * » v m 

Fig, ^2 is a flow chart of a .preferred embodiment of 
the abundance sort program., A source code listing of this 

20 embodiment of the abundance sort' program' is set forth in 

Table t5. In the/Table 5 implementation, the abundance sort 
program is: written using the FoxBASE programming language 
commercially aviailable :from Microsoft .Corporation, ; = 
Although FoxBASE was the program chosen for the first 

25 iteration of this technology , it should -not 'be considered 
limiting. Many other programming languages , ' Sybase being a 
particularly desirable alternative, can also^'be used, as 
will be obvious to one with ordinary skill in the- art ;: The 
subroutine names specified in Fig. 2 correspond to 

30 subroutines listed in Table 5. e< f " 

With reference again to Fig. 2, the "Identified 
Sequences" are transcript sequences representing each 
sequence of the library and a corresponding identification 
of the database entry (if any) which it matches. In other 

35 words, the "Identified Sequences" are transcript sequences 
representing the output of above-discussed "step (a)." 

Fig. 3 is a block diagram of a system for implementing 
the invention. The Fig. 3 system includes library 
generation unit 2 which generates a library and asserts an 
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output- stream ^ of -transcript, sequences indicative , of the ■ 
biolt>gicali.= sequences, comprising the library. Programmed 
processor- 4 , recelves-the .data. stream -output ifx^om, unit.? and 
processes.:. this . data,; in- accordance, with abpve-,discussed ; 
5-. "step t(a):'.' .to generate the. Identified .Sequences ^ . Processor:. 
4ri'can:be a.i.prociBssor.i programmed with -the.. commercially 
available computer- program known as the INHERIT. 670 
Sequence Analysis.- System .:andi the' commercially ^available ... 
computer program .knownc as . the Factura ..program ^ .(.bothc . ..v - 
10 aYailable,fr.om..Appliedi.Biosystems inc.) and with the UNIX 
operating system. 

StillNiwithvreference>,to.:!Pig.. ,3./ . the Identified 
sequences are Ibkded'ifitb :^^^^^^ 

abunidande' sort pr ' '^ProBessor^'6 ^eherStes = the 

15 Pinal Transcript 'sequence^' indicat^^ iri both Fi^s. ' 2 'kh<i' 3. 
Fig. 4 shows a more detaiied block' diagram' of planned ■ 
relational computer system, ' inciiidirig variou^ "sekrchln^^ ' ■ 
techniques which can' bfe implemented, along- w£€h afi ' ^^ ' ■ ' 
assortment of databases to' query against. " " ' • ? 
20 ^ With reference to Fig. 2,-the^ abundance sort program- 
first performs an operation known as "Tempnum" on the ^ 
Identified Sequences, to discWd all of the Identified ^ " 
sequences except those which match database entries of ^ 
selected types .For example , the Tempnum process can 
select Identified Sequences which represent matches of the 
following types with database entries (see above for 
definition) : "exact" matchfes, human "homologous" matches, 
••other species" matches representing genes present in 
species other than human) , "no" matches (no significant 
regions of - homblogy with database entries representing 
previously identified nucleotide sequences) , "i" matches 
(Incyte for not previously known DNA sequences), or "X" 
matches (matches ESTs in reference database) . This 
eliminates the U, S, M, V, A, R and D sequence (see Table 1 
35 for definitions). 

The identified sequence values selected during the 
"Tempnum" process then undergo a further selection (weeding 
out) operation known as "Tempred." This operation can, for 
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example/ discard air ^ident if ied "sequence /values ' 
representing matches with selected database^ entries^; : ■ vii 

'^ The identified sequence values^ selected >during^i the 
"Tempred»» process are then ^ classif ied' according^ to Idbrary, 
5 during the »»Tempdesig" } operation . It . is contemplated ^that 
the "Identified- Sequences" can represent sequences from a 
single library; or f rom< two: or-more libraries • i : - 
. i t i reconsider first the ? case that the idientif.ied sequence 
values i represent sequences from, aosingle libr^iry^ -In this 

10 case > all the identified sequence' values* determined duritig 
"Tempred" undergo sorting in the "Templib" operation, 
f urther-'sorting in thfe "Libsort" ^operationc,.- an'd- finally 
additional iBortintr in the^ "Temptarsort."rop(eration. ' For 
example; thesev three sorting operations can sort the^h 

15 identifiedAsequences in order of diacreasing :" abundance • ; 
number" ^'(tb^ generate a list of decreasing abundance : 
numbers > each abundance number correspdnding.to.a unique 
identified sequence entry, or several lists of decreasing 
abundance numbers, with the abundancejnumbers: in each list 

20 corresponding to database^ entries of a selected itype) with 
redundancies eliminated from each sorted list ^ In this 
case, the operation identified as "Cruncher** can be 
bypassed , ' so thatt: the "Final Data?* ^ values are the organized 
transcript sequenciss produced during the nlTemp tar sort". 

25 operation.-^ •"■^ -■ .c^ .i.;-: -.-. . .^j '-r^^-'^:-^- 

We next ' consider the case that the^ transcript ■ • 
sequences produced- during the "Tempred" operation- represent 
sequences from two libraries (which we will denote the' 
"target" library and the "subtractant" library)-. For 

30 example, the target library may consist of cDNA sequences 
from clones of a diseased cell, while the subtractant 
library may consist of cDNA sequences from clones of the 
diseased cell after treatment by exposure to a drug. For 
another example, the target library may consist of cDNA 

35 sequences from clones of a cell type from a young human, 

while the subtractant library may consist of cDNA sequences 
from clones of the same cell type from the same human at 
different ages. 
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In this.^«eV;^he ^"Teir^ 
transcript sequences' reprisent'i'nV the'^tkrgel^^^^ for 
processing -in accbrdance witfr "Tempiib- Vand then' "Libsort" 
. "Tenptaifsort") / ^ahd 'riutes'^ali' ti^inscHpt ^'eqiaiiii.-' 
5 representing the Wractant ' library for^ prbce^sing 
accordance with- -Tempsiib"" (and" then ' "'Subsort"' iirid'"'' ' '•• 
••Tempsubs'orf) / ~ Poir i^aii^e, the co:,sei:Utive "Teinplib " 
"l^bsort,^r-^d -..Teif,pt^rs^^ bpe^ktions' kort = ' 

. ^.**^tified sequences from' riie target "library in order^of" " 
10 decreasing aburidanbe numter Ho gener^^^^^^ - 
decreasing abundance numbers, each abundance number 
corresponding to a database' ehtS/br ^^e^^^^l -list^ ' 
decreasing atmndance ' humb^fs/ WtK^ the i„ 
. ^ach list- corresponding to database' ent^ies^^ot a^'^elected'^"^ 
15 type) with redundancies 'eiimiKated 'fx^bin'e^i^ti'^^S^ ii4t'" " 
• The consecutive "Teinpsub;- ..subsb:rt;i' and^ "Tenipsub^irt"" " 
sorting operations sort; identified sequences frbm the ' 
subtractant library in order of decreasing abundar.ce' number 
(to generate a list of decreasing abundance numbers; each 
20 abundance number correspbhding to a database entry 'or 
several lists of decreasiW abunda^nce' numbers, with th^' 
abundance numbers in each list corresponding to database 
entries Of a selected type) with redundancies eliminated 
from each sorted list. ; . , , 

25 The^ transcript sequences "output from the" "Temptarsort" 

operation typically represent sorted lists from which a 
histogram could be generated in which position along one 
(e.g., horizontal) axis indicates abundance number (of 
target library sequences) , and position along another 
30 (e.g., vertical) axis indicates identified sequence value 
(e.g., human or non-human gene type), similarly, the 
transcript sequences output from the "Tempsubsorf 
operation typically represent sorted lists from which a 
histogram could be generated in which position along one 
(e.g., horizontal) axis indicates abundance number (of 
subtractant library sequences) , and position along another 
e.g., vertical) axis indicates identified sequence value 
(e.g., human of non-human gene type). 
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Trie transcript sequences "(sorted lists) output from - 
the ^empsubsort and Temptarsort sorting -operations are* 
combined duririg the operation identified as "Cruncher r^* 
The "Crdnchisir"' process identifies pairs of corresponding 
5 targfet and subtriactant aburidihCe numbers (both representing 
the samte idientif led sequence value) , ,and divides <^dne by the 
other ±o generatse a "ratio*' value if or^ -each pair of . v i> 
corresponding abundance numbers f' and theh sorts the ratio 
values in order^ of decreasing ratio value / ^The data output 
Ito f rom the ' "Cruncher" = ^operation' (the ^ Final * Transcript - to 

sequence ' in Fig. -2) is typically c^a sorted yiist^ f rom ^ which a 
histogram could -be^i generated in which position along one: 
axis^^ihdicates = the size "of a ratio * of -abundance numbers 
(for ^corresponding identifdied sequence values from target 
15 and subtract^nt libraries^ and position along t another axis 
indicates identified sequence value' (e.gl, gene type); ' 

^ Preferably, prior to obtaining a -ratio between the' two 
library abundance values, the Cruncher operation also ^ 
divides each ratio value by the totals htmber of sequencesuf 
20 in one or bdth of the target and subtractant ^libraries i > 
The resulting lists of "relative" ratio -values generated by 
the Cruncher operation are useful for many medical, * 
scientific, and industrial applications Also preferably, 
the output of the' Cruncher '.operation is a set of lists,: 
25 each list representing a sequence of decreasing ratio 
values for a different selected subset (e.g. protein 
family) of database entries, - ■ ; 

In one example, the abundance sort program of the ' 
invention tabulates for a library the numbers of mRNA 
30 transcripts corresponding to each gene identified in a 

database. These numbers are divided by the total number of 
clones sampled. The results of the division reflect the 
relative abundance of the mRNA transcripts in the cell type 
or tissue from which they were obtained^ Obtaining this 
35 final data set is referred to herein as "gene transcript 
image analysis." The resulting subtracted data show 
exactly what proteins and genes are upregulated and 
downregulated in highly detailed complexity. 
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' ^ ■'■ '-^ fi'«6^ \ HUVEC^ el >KA iI.^ BRaPV... ... , 

- .. Table.,2 is.^an, abundance , table.. iisting:..the.various gene 
transcripts . in, an induced: HUy?C .library, . i The .transcripts 
#re .Usted.4n,.9rder. pfodecreasang,:abundance.r This 
coznputerdzpd .sQrting simplifies analysis of .the tissue .and 
speeds, identification.of..significant^new proteins < which are 
specif ic to this ,cell.,typfe-. H . This, -type. ,of .-endothelial cell 
.lanes, tissues of, the cardiovascular systep, .and- .the more 
mt,as..known.abo«t its,.cornposition,.:particularl^^ 

response.tovactiyation.ithe:more:.choices:of.protein.*argets 
becoma.availablerto :-affect.in.,rtreatingvdisorders of this 
txssue, such as.the,.highly prevalent -atherosclerosis. ^) 

..6..?.: JMONOCYTE-CRM. im>,m.^n'-nv^r...^r^ .^^^^„^ 

1 ' -. Tables 3.. andv4 , show: . truncated . comparisons of two 
15 libraries, m Tables 3 and 4 the "normal monocytes" are 
the HMC-i cells, and the activated, macrophages" are the 
THP-l.-cells.pretreated with.,PMA and activated with iJ^s. , , 
Table, ,3 lists in descending order ;of . abundance the most 
abundant gene transcripts .for both c^ll. types, ^with-only 
15 gene, transcripts from each cell .type, ;,this. table permits 
quick,, qualitative comparison of the most common - 

transcripts.. ,This.abundance.sort, with,;its convenient 
side-by-side display, , provides .an immediately useful 
research. tool, mthis example, this research tool 
25 discloses that .1) only one -of the top 15 activated 
nacrophage .transcripts is found an the top 15 normal 
monocyte gene transcripts (poly .a binding protein); and 2) 
a.new.,genetranscript .(previously unreported in other 
databases) is relatively highly represented in activated 
30 macrophages .but. is not similarly prominent in normal . 
macrophages, .Such a research tool provides researchers 
with a short-cut to new proteins, such as receptors, cell- 
surface and. intracellular signalling molecules, which can 
serve. as .drug targets in commercial drug screening 
35 programs. Such a tool could save considerable time over 
that^consumed by a hit and miss discovery program aimed at 
Identifying important proteins in and around cells, because 
those proteins carrying out everyday cellular functions and 
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represented as steady state mRNA are quickly ^limiriated' 
f roro 'f iiHiher characterization. 

This illustrates how the gene transcript profiles 
bhange wilih a^^reQ' c^iiular^^^ ^ose'^kill^d^^n^^^ 
5 the art <lchow *that th^ biochemical' composition of c^ 
changes'' fwith Qtiier f urict^ional changes sucB as ^caricei:; 
including ^cancer's Various stages, arid exposure to • ^ 
^toxicity.' A gene transcript subtraction ^profile sucli as in 
Table '3 is useful as" ^ a first screening tool f oi: ^ such gene 
10 expression and protein studies.' ^ ' 

6.8. SUBTRACTION ANALYSIS OP NORMAL MONOCYTB-CELL AND 

r.jACTIVAT ED MONOCYTE CELL, cDNA LIBRARIES . . .. . 

Once, the .cDNA -data > are in the , coipputer , the computer 
program as .disclosed .in , Table 5 was used, to obtain ^ratios 

15 pf .all the ^ene transcripts in the two libraries discussed 
in .Example 6- 7 , 7; and the gene transcripts were ^^sprted by the 
descending -values of ^ their ratios. If a gene transcript is 
not repriesented in one library,, ; that gene transcript's . 
abundance ,is unknown but appears to be less than. 1. As an 

20 approximation. • — and to obtain a ratio, which. would not be 
possible if the unrepresented gene were given an abundance 
,of , zero genes which are^, represented in only one of the 
two libraries are vassigned an abiindance of 1/2, Using 1/2 
for unrepresented clones increases ^ the relative importance 

25 of "turned-pn" and "turned-pff " genes, whose products would 
be drug candidates. The resulting print-out is called a v 
subtraction table and is an extremely valuable screening 
method, as is shown by the following data. 

Table 4 is a subtraction table, in which the normal 

30 monocyte library was electronically "subtracted" from the 
activated macrophage library. This table highlights most 
effectively the changes in abundance of the gene 
transcripts by activation of macrophages. Even among the 
first 20 gene transcripts listed, there are several unknown 

35 gene transcripts. Thus, electronic subtraction is a useful 
tool with which to assist researchers in identifying much 
more quickly the basic biochemical changes between two cell 
types. Such a tool can save universities and 
pharmaceutical companies which spend billions of dollars on 

35 
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researchvvaluable tione and", laboratory - resources at ■ the • < 
ear ly,.(iiscovery , stage and can speed up the drug development 
cycle, , which, in) turn permits researchersr to-set . upr.drugi-. .. 
screening,pripgrams; much eardier. i frhus;,r this.research tool 
;5 provides- a uway ..toi get new. drugs, to ■ the: public; faster anflec; 
:more.i.economicailiy...,M v-.ru- ...r-r.-. i- ,ri;, - i j, i . 

l;:r^„:.Aaso;ii such a Bobtractibhntablercah.be obtained forr- 

patient diagnosis.. ;,Anvindavidualupatieht sample ^.suchi'as 
monocytes, obtained .from a biopsyi • or-faioodrrsample) . can be 

10 compared.with; data provided.vherein tocdiagnbse Lconditions. 
associated with .macrophage (activation;, .^o -r.. c. = £ ;<, 
... .TTable 4 uncovered many new gene transcripts (labeled 
Incyte clones) . Note that many genes are turned on in the 
activated macrophage (i.e., the monocyte had a 0 in the 

15 bgfreq column) . This screening method is superior to other 
screening techniques, such as the western blot, which are 
incapable of uncovering such a multitude of discrete new 
gene transcripts. 

The subtraction-screening technique has also uncovered 
a high number of cancer gene transcripts (oncogenes rho, 
ETS2, rab-2 ras, YPTl-related, and acute myeloid leukemia 
mRNA) in the activated macrophage. These transcripts may 
be attributed to the use of immortalized cell lines and are 
inherently interesting for that reason. This screening 
technique offers a detailed picture of upregulated 
transcripts including oncogenes, which helps explain why 
anti-cancer drugs interfere with the patient's immunity 
mediated by activated macrophages. Armed with knowledge 
gained from this screening method, those skilled in the art 
can set up more targeted, more effective drug screening 
programs to identify drugs which are differentially 
effective against 1) both relevant cancers and activated 
macrophage conditions with the same gene transcript 
profile; 2) cancer alone; and 3) activated macrophage 
35 conditions. 

Smooth muscle senescent protein (22 kd) was 
upregulated in the activated macrophage, which indicates 
that it is a candidate to block in controlling 
inflammation. 
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6.9. SUBTRACTION: ANALYSIS OF NORMAL LIVER CELLS AND 
HEPATITIS INFECTED LIVER CELL ^cDNA LIBRARIES 

In this example; ifats eoce exposed to hepatitis virus 

and maintained in the colonic* until, ^they {sb^ definite signs 

5 of hepatitis:;^ .Of the- ratS::diagnosed-,with one 

half of 'the rats ; are tr:eated with- a hew ^anti-hepatitis 

agent (AH|l) . Liver -samples are .obtained from ail rats 

beforf Exposure to the hepatitis virus ; and: at r the end of 

AHA treatinent; or no treatment. In additfioh, liver samples 

10 can be obtained from rats with hepatitis just prior to AHA 

trteatment* 

The liver tissue is treated as described in Examples 
6.2 and 6.3 to obtain mRNA and subsequently to sequence 
cDNA. The cDNA from each sample are processed and analyzed 

15 for abundance . according to the computer program in Table 5. 
The resulting gene transcript images ,of the cDNA provide 
detailed pictures of the baseline (controL) :for each animal 
and of the infected and/or treated state of the animals. 
cDNA data for a group of samples can "be combined into a 

20 group summary gene. transcript profile for all control 
samples, all samples from infected rats and all samples 
from AHA- treated rats. ^ 

Subtractions are performed between appropriate 
individual libraries and the grouped libraries. For 

25 individual animals, control and post-study samples can be 
subtracted. Also, if samples are obtained before and after 
AHA treatment, that data from individual animals and 
treatment groups can be subtracted. In addition, the data 
for all control samples can be pooled and averaged. The 

30 control average can be subtracted from averages of both 
post-study AHA and post-study non-AHA cDNA samples. If 
pre- and post-treatment samples are available, pre- and 
post-treatment samples can be compared individually (or 
electronically averaged) and subtracted. 

35 These subtraction tables are used in two general ways. 

First, the differences are analyzed for gene transcripts 
which are associated with continuing hepatic deterioration 
or healing. The subtraction tables are tools to isolate 
the effects of the drug treatment from the underlying basic 

40 pathology of hepatitis. Because hepatitis affects many 
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parameters, additional liver toxicity has been difficult to 
detect with only blood t^sts f or the usual enzymes. The 
gene transcript profile and subtraction provides a much 
j:i^r,/^°''%;^9!^^''^^°^^^^ picture Which researchers have 
^r^'^'-;^^^^'^^°';^'l^^^ difficult problems. 

Second," the subtraction tables provide a tool for 
'• - ideritifyihg-clinickl ik'airkers, individual proteins or other 
biochemical determinants which are used to predict and/or 
evaluate a Clinical endpoint, such as .disease, improvement 
10 due to the drug, and even add it ionai' pathology due to the 
. ,..drx.g. The subtra^btion^^ables specii^iay.. highlight genes 
; 7 Which ai^ turned tdri^orj^^ff . Thus JIM siibtyiction tables 
:prpvide a first sdWen for a. set df- g^ne ^transcript 
-candidates for use^^as ;<:llnical mafk^; Jisyfesequently, 
.15 .electronic subtractions- of additional.:ceii;:and tissue 
, ^l-i'raries reveal which>f the potei^ti^ .miB^rs are in fact 
;, , found in, different cell and tissu^'liiirak^^. candidate 

;gene traijscripts found^in additioria^^ib^ariis are removed 
: from th^ ^et of potential clinical i^ma^efs- :.:^^^ tests of 
-20 .blood or other relevant^samples whichi are ■ known to lack and 
^^have the; .relevant ^cxondition are comp^di to:^lidate the 
selection of the clinical marker. ! In. this method, the 
particular physiologic function of the protein transcript 
-. ;need not.be determined. to qualify the gene transcript as a 
25 clinical marker. ; 

6.10. ELECTROWTr NORTHERW WT.nm 

one .limitation. -Of electronic subtraction is that it is 
difficult to compare more than a pair of images at once 
once particular individual gene products are identif ied'as 

30 relevant to further study (via electronic subtraction or 
other methods) , it is useful to study the expression of 
single genes in a multitude of different tissues, m the 
lab, the technique of "Northern" blot hybridization is used 
for this purpose, m this technique, a single cDNA, or a 

B5 probe corresponding thereto, is labeled and then hybridized 
against a blot containing rna samples prepared from a 
multitude Of tissues or cell types, upon autoradiography 
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the pattern of expression of that particular gene, one at a 

time, can be quantitated in alT the included samples. 

In contrast, a further embodiment of this invention is. 

the computerized form, of this process/' termed here 
. 5 ^electronic northenn blot. " vin thi$ ryariation> ra, single 
5i. gene is queried f dr-vexpression against a multitude of 
V :prepared ind sequenced Jfibraries pffes;ent .within the 
^ database. In this' wayV the pattern bf ' expression of any 

single candidate gene can be examined instsbntaneously and 
10 effortlessly. More , candidate genes, can thus be scanned, 
^ .leading tp more f^eguisnt and fruitfully 
? discoveries. The qompyt^ program ^-IticI^ Table 5 

1 includes a program^'-f or performing this-^ function, and Table 

'6 is a partial listingraf entries of >the database used in 
-.15 -the electronic northerii blot .ana lysis .f..; ,;^..^ ' . c 

: 6,1XV PHASE I CLINiCM, TRia!LS 

Based on the establishment of safety' and effectiveness 
in the above animat tests. Phase I, clinical tests ^are 
c undertaken. Normal patients are subjected ; to the usual 
;20 preliminary clinical ! laboratory tests.'; In addition, 
appropriate specimens are taken and' subjected to gene 
transcript analysis. Additional patient specimens are 
taken at predetermined intervals during the test. The 
specimens are subjected to gene transcript analysis as 
25 described :above. In ; addition, the j gene transcript changes 
noted in the earlier rait; toxicity study are carefully 
evaluated as clinical markers in the followed patients. 
Changes in the gene: transcript analyses are evaluated as 
indicators of toxicity^ by correlation with clinical signs 
30 and symptoms and other laboratory results. In addition, 
subtraction is performed on individual patient specimens 
and on averaged patient specimens. The subtraction 
analysis highlights any toxicological changes in the 
treated patients. This is a highly refined determinant of 
35 toxicity. The subtraction method also annotates clinical 
. markers. Further subgroups can be analyzed by subtraction 
analysis, including, for example, 1) segregation by 
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occurrence and type of adverse effect; and 2) segregation 
by dosage. 
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A gene transcript imaging analysis (or iultiple gene 
transcript imaging analyses) is,, a useful tool iiilther 
pjinical studies, example ;j±he.^^tffer^6e^- i^ gene 

pr-ansoript imaging ajialyses^ before aftlrii^^ment can 
^:.assessed for Patents So^p pla9ebo rarid^rfigyeaiment. 
Tiixs m^od also ef^ctiyely s^ee^ ^6^-c^k^^afemar3cers 
tp foliow in cliniica^l use of the drtfgl^; ^ 1/ / 

■ a Q -n I t: £ " r t -j; ^• 

- The sub|racti|onAimethod;c^n>bi visdd to^'scr^n- cDNA 

libraries frcm diver^e.spurpeW.^ Por^^ex^mpie| t^e^same cell 
types f^om dif f er^t,;spec;a^^ can be Icohpared iby g^e 
transcript an^lysi^ tolscrreeni^f or>^ciilc:^ffferences, 
^ such as: in detoxif iciktlon' 4^zy„ie Qystem^ ^ s{^ri testing 
^^ds an-l:he selection and validation of an animal model for 
the commercial purpose of drug screeningVor toxicological 
testmgrc.of drugs intended for human or animal- use. When 
the co^arison between ^animals of different Species is^^ 
, shown^in columns for each species/ we refer to this as ^an 
xnterspecies comparison, or zoo blot. 

V, Embodiments of this invention may employ databases 
such as those written; ^sing the FoxBASE prbgramming 
language, commercially available from Microsoft Corporation 
Other embodiments of :the invention employ other databases/ 
such as a random peptide database, a polymer database, a 
synthetic oligomer database, or a oligonucleotide database 
Of the type described in U.S. Patent 5,270,l7o; issued 
December 14, :1993 to Cull, et al., PCT Internatiohal 
Application Publication No. wo 9322684, published November 
11, 1993, PCT international Application Publication No. wo 
9306121, published April i, 1993, or PCT International 
Application Publication No. wo 9119818, published December 
26, 1991. These four references (whose text is 
incorporated herein by reference) include teaching which 
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nay be applied in implementing such other embodiments of 
the present invention. 

All references referred to in the preceding text are 
^ ..|;]bereby^ incorporated by reference herein* 

-5^* ' Various modifications and variations of the described 
method and system of the invention will be apparent to 
tiiose skilled in the art without departing from the scope 
* and spirit of the invention. Although the invention h^s 
.;.;been dl^cribed in' cjpnriection with specific preferred- -'^^ 
10 embodiments J ^ it shoiildy be understood that the invention as 
.o> claimed: should notrbe unduly limited to such specific v>: 
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/ TABLE 2 



Clone numbers 15000 through 20000 

Libraries':-->HUVEC --^^ ^r!^vr^.•:ilv^^ : ii;- ■ 
Arranged^by AI3UNDANCE 
Total't.cioinefs analyzed: 5000 



319 genes/ for a total of 1713 Clones 
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descriptor 

Riboptn L41 

INCYTE 015004 

XNCYTE 015638 

INCYTE 015390 

Fibronectin 

Riboptn L9 

INCYTE 015280 

EST HHCH09 (IGR) 

Act in, gamma . 

INCYTE 015026 

Elf 1-alpha 

INCYTE 015027 

INCYTE 015033 

INCYTE 015198 

Collagenase 

INCYTE 015221 

INCYTE , - 015263 . 
. INCYTE, t 015290 ' 
?INCYTE 015350X:. . . 
MNCYTEi:^015030 ^ '• ? ^ 
..!INCYTE^' -^15234- • : • 

'''INCyTE".;i015459 \ ' - ' 
: INCYTE : bl5353 ; . \ 

Pth kinase inhib r 

Thymosin ' ,bet a:74 o . o > 
;;.Lipocortin: c"' ' 

Pbly-A^bp' :-: 
^Thymosin, alpha V ■ = - ^. '■ 
'Motility relat ptn; MRP-l;CD-9 
^Interferon indue ptn 1-8D 
- FKS06 bp : ' . 
. Histone';H2A, ' \/ ' 

Lectin,' .Bf-galbpV i4kDa 
. INCYTE 015789 > 

Ribopth'iSia o ' / 1 

EST HHCA13- (TGR) ■ 

INCYTE 018314 . . 

INCYTE 015367 - . ' ' 

interferon indue mRNA 

Lactate dehydrogenase 

C Myosin heavy chain B 

INCYTE 018210 

RNA polymerase II 

INCYTE 018996 

Ferritin, light chain 

INCYTE 015714 

INCYTE 015720 

INCYTE 015863 
Endothelin 

INCYTE 018252 
Lipid bp, adipocyte 

INCYTE 015370 
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TABLE 7 Can't 



53 15670 'a 

54 157.95-*' 8 

55 J 16245^* - "^8 " 

56 18262 ; ,8 

57 i;.:iB321 / ' 8 

58 :;{;15126to. 7 

59 avisaasv^ ^ .. ^ .;7 

60 15245r 7 

61 15288 7 

62 ■ 15294 7V 

63 ;'^'^15442'.'-'^ ■ '^'^-7 

64 m5485 - . 7 

65 ;^c:i664i5 .: . ^ ' 7 ' 

66 .18003 X' 7 

67 15032 r\ i ^ 6. . 

68 15267 6 ' 

69 15295- 6 

70 vi5458^^' - 6v 

71 15832 6 

72 15928"- ,6 

73 : 16598 6 

74 18218, . 6 

75 .18499^:- V 6 

76 18963 6 

77 18997 6 
78. 15432 5 

79 15475 5 

80 15721 . - 5 

81 ; 15865 r 5 

82 . 16270 ^ .5 

83 I68861 ' 5 

84 . 18500 5 . 

85 18503 5 

86 19672 5 

87 15086 4 

88 . 15113 4 

89 15242 4 

90 15249 4 
91- . 15377 > : 4 

92 15407 4 

93 ^ 15473 - 4 

94 15588 4 

95 15684 4 

96 - 15782 4 

97 15916 4 

98 15930 4 
99. 16108 4 
100 16133- - 4 



V -:<entry 

BTCIASHI 
NCY015795 
NCY016245 
.NCy018262 
fiSRPi;i7' ' 
XLRPLIBRF 
v., HSAC07.,v . 
NCY015245 
NCy015288 
HSGAPDR-' . 
HUMLAMB 
HSNGMRNA 
NCY016646 
HUMPAIA 
- ZHUMUB 
HSRPS8 
^ NCy015295 
KNRPSIOR 
RSGALEM 
HUMAPOJ 
HUMTBBM40 
NCy0l82l8 
HSP27 
NCy018963 
NCy018997 
HSAGALAR 
NCy015475 
NCyoa5721 
NCy015865 
NCy01627O: ^. 
NCy016886 " 
•NCyOl8500 
NCy018503 
RRRPL34 
XLRPLlAR 
HUMIFNWRS 
NCy015242 
NCy015249 
NCy015377 
NCy015407 
NCy015473: 
HSRPS12 
HSEFIG 
NCy015782 
HSRPS18 
NCy015930 

NcyoieioB 

NCy016133 
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s descriptor 

V NADH-ubiq oxidoreductase 

INCyXE 015795 
INCyTE 016245 
,,^INCyTE 018262 
Riibbpth ' 1^17 y • 

Riboptn lil 
0. N cActitir beta 

INCYTE 015245 
INCyXE 015288 
'^-^:-G-3-PD 

Laminin receptor, 54kDa 
Uracil DNA glycosylase 
INCyTE 016646 
Plsmnogen activ gene 
Ubiguitin 
Riboptn SB 
INCYTE 015295 
Riboptn SIO 

UDP-galactose epimerase 
Apolipoptn J 
Tubulin, beta 
INCYTE 018218 
Hydrophobic ptn p27 
INCYTE 018963 
INCYTE 018997 
Galactosidase A, alpha 
INCYTE 015475 
INCYTE 015721 
INCYTE 015865 
INCYTE 016270 
INCYTE 016886 
INCYTE 018500 
INCYTE 018503 
Riboptn L34 
Riboptn LI a 
tRNA synthetase, trp 
INCYTE 015242 
INCYTE 015249 
INCYTE 015377 
INCYTE 015407 
INCYTE 015473 
Riboptn S12 
Elf l-gamma 
INCYTE 015782 
Riboptn S18 
INCYTE 015930 
INCYTE 016108 
INCYTE 016133 
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TABLE 4 



Libraries^" THP^r 
Sxibtracting: HMC 
Sorted by ABUNDANCE 
Total*:clones analyzed: 



7375 



^°^:^f?®^^h total of .;2151^clones 

i' s Rescript or 0^;. 



numter^ .r: I , ii^ehtry 
10036-^-^ 



10089- 



^9^„'HSMDNCF 
HSIJ^GICDN 



10003 
10689 
11050 

10937 a 

10176 ^ 

10886- 

10186 

10967 

11353;/ 

10298 * 

1021S / 

10276 

10488 - 

11138 -r 

10037 ^ 

10840 

10672' 

12837 ^ ' 

10001 

10005 

10294 

10297 

10403 

10699V ^ 

10966 : ; 

12092 

12549 

10691 

12106 - 

10194 

10479 

10031 

10203 

10288 

10372 

10471 

10484 

10859 

10890 

11511 

11868 

12820 

10133 

10516 

11063 

11140 

10788 

10033 

10035 

10084 

10236 

10383 



HUMMIPIA 
J HSOP 

NCY011050 
HSTNFR 
HSSOD '^^ 
HSCDW40 
HUMAPR 
HUMGDN 
NCY011353 
V. NCy010298 
r :.HUM4COLA 
NCY010276 
NCY010488 
NCY011138 
HtmCAPPRO 
HUMADCY., 
' Hi5CD44E' 
;: HUMCYCLOX 
NCYOiOOOl 
NCYOIOOOS 
NCy010294 
NCY010297 
NCY010403 
. NCY010699 
- NCYQ10966 
NCy012092 
HSRHOB 
HUMARFIBA 
HSADSS 
HSCATHL 
CLMGYCA ] 
NCy0l0031 
NCY0i0203 
NCY010288 
NCY010372 
NCY010471 
NCY010484 
NCY010859 
NCY010890 
NCY011511 
NCY011B68 
NCY012820 
HSIIRAP 
HUMP2A 
HUHB94 
HSHB15RNA 
NCy001713 
NCY010033 
NCY010035 
NCY010084 
WCY010236 
NCY010383 



• ' ^ It * 1-beta^ - ^ - - . , , 
IL-8 

Lymphocyte activ gene 
^c- RANTES 
MIP-1 

Osteopontin 
INCYTE 011050 
TNF-alpha 

Superoxide dismutase 

B-cell activ, NGF-relat 

Early resp PMA-induc 

PN-1, glial-deriv 

INCYTE 011353 

INCYTE 010298 

Collagenase, type IV 

INCYTE 010276 

INCYTE 010488 

INCYTE 011138 

Adenylate cyclase 

Adenylate cyclase 

Cell adhesion glptn 

Cyclooxygenase-2 

INCYTE 010001 

INCYTE 010005 

INCYTE 010294 

INCYTE 010297 

INCYTE 010403 

INCYTE 010699 

INCYTE 010966 

INCYTE 012092 

Oncogene rho 

ADP-ribosylation fctr 

Adenylosuccinate synthetase 

Cathepsin L 

Cyclin A 

INCYTE 01 boil 

INCYTE 010203 

INCYTE 010288 

INCYTE 010372 

INCYTE 010471 

INCYTE 010484 

INCYTE 010859 

INCYTE 010890 

INCYTE 011511 

INCYTE 011868 

INCYTE 012820 

IL-1 antagonist 

Phosphatase, regul 2A 

TNF-induc response 

HB15 gene; new Ig 

INCYTE 001713 

INCYTE 010033 

INCYTE 010035 

INCYTE 010084 

INCYTE 010236 

INCYTE 010383 



bgfreg rfend ratio 



0 

0 

0 
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0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 



131 
119 
71 
23 
121 
20 
17 
17 
14 
10 
9 
9 
8 
7 
6 
6 
6 
6 
10 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
3 
3 
3 
3 
3 
3 
3 



262,00. 
238.00 
142.00 
46.000 
40.333 
40.000 
34.000 
34.000 
28.000 
20.000 
18.000 
18.000 
16.000 
14.000 
12.000 
12.000 
12.000 
12.000 
10.000 
10.000 
10.000 
10.000 
10.000 
10.000 
10.000 
10.000 
10.000 
10.000 
10.000 
10.000 
10.000 
8.000 
8.000 
8.000 
8.000 
8.000 
8.000 
8.000 
8.000 
8.000 
8.000 
8.000 
8.000 
8.000 
8.000 
8.000 
8.000 
8.000 
8.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
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TABLE 4 Con't 



nun^er , 

10450" 

10471) - A 

10SD4 V 

10507 

10598 

10779 

10909 

10976 

10985 

110^2.^, 

11068CV 

11134 

1113^ 

11191 

11219 

1138i5''-^ 

11403 ' ' 

11460 

11616" 

11686 

12021 • 

12025- 

12320^ 

12330 

12853 

14386 

14391 



entry.,. 

NCYOIMSO^ 
. KCy010470 
;NCy010504 
:iicy010507 
NCY010598 
NCy010779 
NCy010909 
NCy010976 
NCy010985 
NCy011052 
NCy011068 
f:NCy011134 
>NCy011136 
NCY011191 
'NCy011219 
NCY011386 
NCy011403 
NCy011460 
NCy011618 
NCy011686 
NCy012021 
NCy012025 
JICy012320 
NCy012330 
NCy012853 
NCy014386 
NCy014391 



descriptor 

*" INCyTE 010450 
INCyTE 010470 
INCyTE 010504 
cr, INCyTE 010507: 
INCYTE 010598 
INCYTE 010779 
INCYTE 010909 
INCYTE 010976 
INCYTE 010985 
INCYTE 011052 
INCYTE 011068 
INCYTE 011134 
INCYTE 011136 
INCYTE 011191 
INCYTE 011219 
INCYTE 011386 
INCYTE 011403 
INCYTE 011460 
INCYTE 011618 
INCYTE 011686 
INCYTE 012021 
INCYTE 012025 
INCYTE 012320 
INCYTE 012330 
INCYTE 012853 
INCYTE 014386 
INCYTE 014391 
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TABLE g 



Master' nenu for out^ 

SET tftLKiOPF.-v. ; Ai....,.-^.^-!.-,.,,, ' 

SET fiAPETT OCT . . - - ' '^ ^ ^ l- :- IJl'^.^^i. J 

SET. EXACP^ CW^ . : : o.: >: J ^ t ■ . 

fiBT-TOEAHEAD TO (Mr^-xv ^o^v..^. v- ^r- > . ^ , - - 
SST.DSfZCS TO SCS^EQT 

tJSS-«fihtiartav«fa«SB+/Macsfox files i Clones. dbf 

gp, TOP - 'f , , • -„ ' f - * ; ; . ' • 

SjCTB KDKBESl TO'lSRKDQiTE 



^ • TO ObJectl 

OTOgE ore AMMi 

CTOE ;0 ro aagcH 

ggggS 0. .TO HHKICS 
STORB . p ,ro OATOH 
CTORE 6 'm IMRTOH 
fia!QKE 0 TO Jiy ' • 
tf i UKB 1 TO BAXLr 

DO RBXI£! .T, * ' 
* :,TrograBB..r*Subtriicticn 2.£ntt 

* VerstTOf i P05<BaSE*yitoc, fevislca' i.lO 

• Kotea. . ... t Pesmat file fiubtractioh 2- 



^FfS^^^ -Screen 1* 40,2 5128 286,432 PIXELS POWT •Qeaeva- 9 avnk o n n 
2 ?JA^1i SWT ;-eubtractiDn Menu- SHLE fishe font -C^wvaV^ COLOR 0 0 -i i • i i 

a PIXELS 252,336 GET tennlrate SWIE 0 ROT "^^iria^B 15 70^c^ oV^;"*r^'"^ 

f TO "1.3" ffiaS 3871 COLOR 0,0.-X/-2566o,il^l ^ ' ■ 

ft PIXEL S SKI 'Baekground:* snXfi 65536 mor 'fisiev^' carmt n a i i < m 

6 45.135 GET ANM. SmS 65536 faff^iSoVoTS^ .e^fL^ 

•e VSXSLS 135,20 GET t»rg«t2 sntLB 0 PtBW "eeneva'.S SIZE 12 79 o n i i , ? ^ ' ' 

8 VSSaS 162,299 GET objects STftE 0 PONT ■Sauswa- 9 SlS 12^79 cSS 0 o'Zi i ' J'"^ 
■8 Pnas 276,324--CBr-Bail ST5M 65536 POMP .aicasi?,"'?!^ ^sSifBidf KZS ttl2 



* EGPs Guibtractiai.2.£Bit 
HERD • • • 

IF Bail«2 

CL&AR 

CXCSB 

USB ' Snart Guy ;PoxBAS5*/Mac t fox fileaicloaea.dbf • 
.SETT SAFSTT W 
SCREQ ^.l OFF 
KETOSM 
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saOfl£:0PPSR.(Tarsret2) to Targets • ' - -^v^- -x- .r.7..?o.vY.v t y^^ j?...,; 
fiTOIffl;;tJPPBR{Target3) .TO l&rget3 . , , . 

fiTOlffi OTPER{gi3aect2).10'Obj€Ct2 . . - . 

O0^nTT:TO TOT . . 

COPY TO TEMPRED PpR:tb'E\Cft;i^ ' ^ -f'' r*^: v-f^L^■ v-:;: 

USB TEMPBSD • .1 .'w --.i.j 

IP SitatciisO ^ATO. Ktaacih^o iMTO. ..onat^ -U/.. - j ib*: 

cDPv;,To TEMWEsio, , ,v ,{ .r,:,'.-^ ■ . . . 

coyy s repcro ro to tempctsig . ■ At-^. v:^..--- .7 

USB .TOMPDESIG , . . 

g Baat dhyl *^ ^ - ^.-.-'iv' ^/^'.l, ^CN- ;;. .-r 

fAFVEKD FROM IQIENQM EOR 09 'B' 

APPEND PROM TEMEWOM FOR , Dsi'K' 

IP tteatxhil - i ' . 



2P ,ln»tch«l 
APFE^ P 



APHOT PROM :TEMENdM for ' J^^^ " 



COQNT TO 8XARTOT . . - -^r - . .^^ 

copy ffXRaCTORE TO TEMPLIB 
V8B TBgLTB • • 

APPEND FROH TEMPDBSIG FOR libStxynVPS^itBXQ^tl) 

IF targebao' , . • 

APParo^FROM raMPDESlO FO^ li3brary:iOPPBR(to^ v-^ ' ; v 



APPEND FROM. TEYIPDESXG FOR libraxy>DFF£R(targttt3) 
IM DIF . . . . , 

USE TgMFPzsiG ; - r-./^ "-' yr 

OOPr SlROCIORE TO ^EKPSUB - vO/f .i"; 

USB TEtSPS^ , . V . 

A»a© FROM ..raOTBSIO FOR' libraiysUPPE^^ 

I P ta ygtttao' . ' 

APPBK D FROM TEaffiPESJG FOR, liteary*UPPER'(0bject2) ; ^ V. ' - 

.jtfPSMD PRflM IHiErasIG FOR litearyaU 

COUNT TO 6UBTOACT0T ■ . 

SSr TftUC CEP " „ ' ^ * ■ '"• ^ • ' 



COMPRESSION. SCBROCTXIXC A 
?A 'OOMPFSSSm^* QUER^-^UBRAR?* 
USB TS24PLIB 
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iXOCROT ;TO7lDGENB.T\V::i.u.- -;-rr-i. ^- ^y. v-, .,v . v ^ , . - 

: DO^TOILB.-SW2«0 HOU# " ' ' '^^ - --j r^^i-'^^^ 

OOONT TO ADHZQUE ; : • ' -vi - '^a.*, . -f ^ 

COP r 1 

,,;6TWB Sroy.TO-rem - - c ■ • 

STORE D TO ■nBSiG&-. ' " " ^--^*^V ^>^>v. =^.r;,o.. ' caok 

-IP lESlA B.TESTB.AND,DSStGABa3ESIGB . . , ' 

V-.fDBIiBlE- J^;'!-- ^ . v r--?-. ; - 

.IJOOP . . " 



HftRKl - MftRra+tOP'*-- " • " - * ^ ^^^v- 

■LOOP ' ... ;--'^^*-' :'''\ ' -'-^ '''■^'*- * ' , 

fiORT CN RPEJ^D/DiHOMBfiR TO TODP^lARSCJirP . 
- USE ro tTOtftSOOT 

: - *RBg LaCg JUCIi SmRT ffZTO RF£I1D/IDSBMS*10000 
COCNP TO-lfKSWRCO 

TOE .miPSUB 

BORT C3N BHI5Cf,KDMBBR TO SOBSORT 



.COONT TO SDBC^NE 

hbpzace auj bfsmd kzti; i 

HARXl el 
6N3«0 . ' 

SO WHZi£ 8ff2s0 ROLL 
IF MAHKl >= SDH3EME 
PiOC . 

COWT TO BUNXQUB 

LOOP • 
ENDZF 
GO IMKl • 

rop ■ 1 

szoRB, stnKsr to tesxa 

STOBE D TO DBSZO^l 
DO W HILE 'stuTsO TEST 

BTOBE £^?T^y to. tostb 

STORE D TO DBSZ6B 
IF TE^ cs TSSTB.nC.OBSIGA^DESIGB 



SO 



WO9Sf206Sl^ PCr/US95/01160 



GO'MAR>C l>^-'''_'' 

WZ77{ Z3CJP 

LOOT: : 00^ . "Vj ir^-.''- " ' 

SORT . CN >^ggSMD/D , WOMBHR . TO CTCFfiUBSQRT 
/T7SS raMPSUBSOKT / > ' 

*S£PZMB:AIi£#^'raART:HZ!SS ItFEUD/ZSSSaEtlOOOOO 
ODUNT TO ITSKP502C0 ■ ■ ^ " * 'c i ^ '-i'^ v^<^ r • v . t 

7 ' Gasems rmsa ubrahibb * • ? . • oa:^;-: rn^--- cJ-. ; i '\ ieK.-'^':^''"" v.^^^- v 

OSg S OSITOCTia M :>s/;,.;v^ S^';s}.: -istrv '^j^J fi'i ^1 /^^i-f" .v\:. 

OCUOT. TOrajULOOT ':-^v -OXo;.. C ;* -^ xaii 655'^^' rv-'v''- 'V/- • /V' . * 'r-; Vf^V. ■ 

DO rans iTt, ^ " . . V ' . I ' 

MARK « HaWC+1 
ZF KARIOBMLOOT 
UXXT 



STQSE'ENTKSf 10 SCAKBIER 
SSLSCr-^3 

IX^TB. FOR anrofisSCAtlHSR 
IP FO OKDO 

STOBZ -Wmb TO -BITl 
STORE ^H^XZO)' TO -BZA - 



STORE '1/2 - TO Bin 
STORB^ O TO BZ^ 



RZPXACE SQFRBO 'WnH BZT2 
RBKAC& ACXUAL WTIH BZTl 

RFIJT T r ^' i' - • 
RBFIACBjf JU:iti~ HA7TO WITB RFEND/ACTOAL 
? ■COZRQ FZraOi GORT £7 RATIO' 
.SQRP on. RATI0/D,BggTOQ/P# DEgCRr7 IDR TO moO, 
VSB FINAL 

■eb taik'«ff ' 

DO CASE.. -; ': 

CASB PWsO' ■■ ~ r. 
SET DS7ICB TO PRINT 
SEPJRUOT ON ■ 

EJECT • ■ ■ • in ■ .. , 

CASE VCFal 

BBt ALTERNATE TO 'Adenoid .Patent Figures iSUbtzaction.txt" 
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SBffiCASE 



n^;pxzn?sN&f 86400 ' w .FanoMs 



^ZCEE OCSa>SEC/60 TO OQKIMZN 

-sr'^Ku^sr TO lo 

gJ^MT •Llhrarr.a^^ toalysia- ST5M 6S536 PCOT -Geneva'* 274 CXILOR CCO,-!,-! 

^rmtzu. . - 

VWjBTR(ramATE, S, 0) 
a? r'. fc hrcmg h * . ' 
J7;finit3ZRM3NRTS, 6, 0; 

IP.»arget3o' 
??. »,;» 

ENEC? ri;. V?- r^i^^- 



:? -'fiubtracting; 
7? Objects 



IP C3bjeGt3o' 
?7 »/ • 
77 Objects 



7 ''DesiQxsatieosr .* 

??■ 'AND, »natch=0 .AND. Otatch^O" .AND. 1MATC8=0 



IF ESiBtdhal 

7? 'acaet, • 



IP ttwtchsa" 
7?_ 'H uman. ' 
WCtlT 

'IF Qnatdhal 
7? 'Other cp.» 
&83ZF 

7.7 'INOTE' 



7 'Sorted hy ABONCANCE'- 

BM3I?. = 

IF 

7 'Arraaged ty TONCKCN* 
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7 '?otaX oloaes roprsaentedi ■ 

7 'Votal *cloxies analyaeds ' 
7? £m(61ARIIOr#5|0) 
? 'TDbaI.co!tputfttion..tim; 
.?? STR{C0M5MIN*5,3) ' 
?? * mizAitaa> * 

r.vjp'ttd ifdft&flaajiim^^^ diatrlbution t = location r i function b * «p«ci«s 1 b inte 
>J iifiCRSESr; 1 TOPE 0 "Soxeen 1* AT 40»2 'dXZE 286,4^ PXXSL6 FOOT *GeDevaS9 COLOR 0.0 0 

C^.- ■ ' : r'.'' .genesV' :f or.'a--total ..of , , . . - . , - - . 

;- '•. .:: ?, ^. - . 

X- ^V.SCaEEN l TlfPE 'O KEAZmO ■Screen !■ AT 40,2 SIZE 286,492 PIXELS FOOT •Geneva*, 7 O0W?R 0.0.0. 
^,>; ;,Uflt OCT fields inzniber,O,r.Z,Rjm7«Sf,S,IIEeCRI?T0R,BC^^ 
;y;SBr piCE^ ■ ' * . 

*\ 9^: -USE. ■fisiart&uyiFoxaASE4-/Mac!!fax files i clones, dbf* 

• • • arrange/ functioD 

" 'V' ' SEP PRIMP' W 

• ' SET UEftDTOrCM 

. "^^^^^^^^^ HBRDIMS •Screen I'.AT 40,2 SIZE 286,492 PDffiLS *POWr ■Helvetica" , 268 COLOR 0 

saXBm i TSnw O.KBNDINO •Screen l?'AT 40,2 SIZS 266,492 PIXELS PQNP ■Helvetica •.2 65 COLOR 0 
: . 7 'gur faoB nolecules and reccptorci' • wwn v 

SCREEN 1 TfPZ :0 HEADIMS "Screen 1" AT 40,2 3122 286,492 PI3CELS FORT •Qeneva^.7 COtCT 0 0 0 

'SCraN l TSfPE 0 HERDIKG ■Screen V AT 40,2 SIZE 286,492 PIXELS .PONT •Helvetica- ,265 COLOR 0 
" ? ■♦Calcium-hiading^proteiiiBi*- - i-:--. • . , /(.,.-, ^ X 

'/-SCREEN l-TSrsa 0 HEADHre •screen 1" AT 40,2'filZE 2e6ri92 PIXELS FONT ■Geneva*. 7 COLOR 0,0,0. 
' list 'OFF fieldfi^ 

TSfPE 0 KEftDING -TScreeti I' AT^ 40, 2 SIZE 2B6,492:P1XELS FaOT:-Kelvetica^f265 COLCai 0 
" ' - ? !Lignnr3ti *and effectors iV ; ■ S- ' t r, i.*^- ■ 

vfiCBBEN liKPE:« KEtoING •Screen. !• AT. 40. 2 SIZE 2fiS,492 PDCKLS ! FONT »(teneva*,7 COtLOR O.O.Oi • 
J, r.Ust OFF 1 fields nanberrD^:FrZ,R,aim,8,nESCRip^ ^igi . , ' 

- / SCREEN i;iYPE b HSACmre. 'Screen' AT 40,2 .SIZE. 286, 493 PlXBt^ ! ^FtWT^ •Helv^^ *0 
/. 7 'Oth er blndlaig.protaiTist ' , : 

SCREEN 1 TSfPE -O.HEAniMS •Screen 1* AT'40,2 SIZE 286, 492 PIXELS, EWT ■Geneva*, 7 COLOR ; 0,0 tO;, 
:list COT Cieldfl;xn»te,l),F;2,R,: 'FOR R=»I' • ' 

7 '■ • ' ' * '. s '■ • . .'..t d * ■ . . . /-'id ' ; • 

6CRSN i,typE 0 HERDING , • Screen 1' AT;40,2 SI2E^28g,492 PIXBLS FONT 'Helvetica " ,2 68 ^COLCER 0 

7 ■ : ■• ■ » , ■ • , OKCOSBNES* v.^i . ^ 

7 . • . . • [i 

-SCRXESI 1 TirpS:0.KSADIN2 'Screen 1' AT. 40. 2 SIZE 286,492 PIXELS FCtn'' •Helvetica •.265 COLOR 0 

7 'General oftcogenea I* . ^ . v 

SCREEN 1 WPB 0. KEW3IMQ ■Screen !• AT .40, 2 SIZE 286,492 PIXELS .PDNT •Geneva^*,? COLOR 0,0.0. 

list OPP fields cmtecr,D,P,Z,R,EJnilY,S,DESaiIPTOR,BGPREQ,REab FDR Ro*0' 

SCREEN 1 lypE 0 KEADIKSs "Screen 1' AT 40,2 SIZE 286,492 PIXELS FOOT *Helvetica',265 COLOR 0 
7 'CTTP-biading proteins I • • ■ • 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FOOT ■Geneva'*,? COLOR 0)0.0 
list OFF fields nusiber,0,F,Z,R,BirRV,S,t3ESCRIPI0H,BSPREQ,RFSND,RATI0,X FOR Ra*0' 
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WO95Q0681 PCT/DS95©U«) 
SCHE37 1 TVPB 0 BEADING ■Scrsen !• M AO 3 evrc nac ab-% 

1 'Viral alenantBr^^ 4a,2 size 286,452 PIXELS font -Halv«tlea',a65 OOKHl 0 

KMSNi WEE 0 Hssnrw; -screen !• at. «o,2 size aas, 453" pixels fow >(£as& 7/inf. «' 
Xlst dPF fields watoer.D,F,z,R,an»,s,DESCRiP«R,B6MBQf^ 

fSies'^^P^J^r "''.^ ™^ «^ -Helvetica..2« «r«^ 0 

v 'jSCREEN I'TVES 0 HEADING ■Screen !• A!P 40 2 fllzw 5ftif -^o^ ^ _ 

-IW-relatcd aitigensf- ' ™ ^^^'^^^ TIXSL3 VWt -Helvetica -,2^5 COUQR 0 

' SCREEN 1 TlfPE 0 HS?^1KG .'fictBen 1* AT 40.2 STZE 3R£ btvwu e^si.— . 
ll*t OFF -fields a»a«:.p.P.2,R.S«i5fs!Jilc^J?^f^.«^^ 0.0.0. 

SCRSai a TXPE'.O REMmsS *Screaa !■ AT 40,2 SIZE 2d£ 49S ptvptb mm* .^s ^ ^. 
I - v/rv. -PROTEIN SVmETIC M&CHlKERY^llS^ ^ 'Helvetica-, 268 CQIOR 0 

?i SCREEN 1 TVPB 0 HEAIXDX} ■acre«i !• AlP 40 2 grrg ooe ptv^t . . ^_ '_ 

-.7 'fteanecxiptim axui IteleS Sid-^dini pS?^^ ™^ ^ •Helvetica- .265 COLOR 0 

vSCHEEN 1 TSfPB O BEADOMG ^Screen AT~40 2 fiTZE a« Aon srmii ' t^s.^ . 
,.Xift,<»F fields nuna»r.D;I^Xwri^!sfSdci^;i5^f^,S.;^^ 

?;|^'i2SL'?.^°" ^ "e'«2 Fim^.-FOW .HelWica..a65 COLoi.O 

;-;SCREQI 1 TSfPE 0 HEASaQSj •Screen 1* AT 40 2 srrv ^tt£ ao^ 

;,ai»f OFP -^ieias «»iter;DS 0,0,0.- 

■ :f^^^^^^ .^.* mas .Helv.tl«..26S cbLOR 0 

' SCREEN 1 rap£ 0: BBADINS 'Screen 1* AS" 40 ? ^rrv one ita*) ht 

ai.t,QPF fields x«»i«*.D;F;i!^B^3fgEic^?^!S,«^««5, 0.0,0. 

, f "StiiTS^eSS? SIZE 286,492 PIXELS F«W .Kelvetl«.,-265 CCMR 0 

SCPHN 1 vm 0 HEADI>« -Screen 1- AT size 286,493 PfiSLS. KOT •Helv«tica..268 C0Lc« 0 



«»^1^X|P=^H^^ -screen !• AT 40,2 81^ 286.492 PIXELS Fxm -«elveclce.,26S COLOR 0 

SCREEN X TSTPB 0 HEADING ■ficreen 1" AT 40 2 stze 5flfi btvst* 

list :iiol^;?«^,l).P.zX2U:;fgES<S^ 0.0.0, 

rf^LH^aSdSlt:^'"-*' *' ^ 286.492'm^ FONT -H^vetivae^^oiUW 0 

SCREBJ 1 TyPE 0 HEADI^ "Screen !• AT 40.3 ST21!: aqi «Tvt7r« . 

list OBT fields .n»mber;O.P.Z.R.^!sScSI?l^fS.R^.i'^^ 

f^ditI^5^6SS?.;fS?f . ^* ^ •Helvetl«..265 COLOR 0 

Se^ENl T»B 0 RSffiOlK] "SerBen !• AT 40,2 SIZE 286,492 P1XEI.S FONT "fSe^awa. i r««o «'« « 
list QPP fields I»«fiber.D.P.2,R,E!nBy.e.tESCiaJTOR.a(biiQ^^.K^ 

f^ai JSLSl?^^ " «»*r'.Kelv,tlca.,265 COWR 0 

iSf^ ^^r^'f^JLf^J^'^ 286,492 PIXELS FONT "Geneva- 7 otiLOH 0 0 n 

list Qpp fields wBi»ex.i3.f,z,z.mm,s,vsscsapraR,xncEQ.}^^ "'."'O' 

fSo ™d "'^^ «^'-Hel,;.tlc..,26S C0L» 0 

BCffiEM 1 TJfPE 0 HEADIHQ -Screen 1- AT 40,2 SIZS 286,492 PIXELS -Geneva-,7 cok O.O.O/ 
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liat OFF fields nuiaber,D,P*z,R,a7mi8iOBSCRZPim,BQntEQ,RFE^ FOR Rs'M* 

fiCRpr 1 TYPE 0. HEAD^G -Scar^^n 1- AT 40,3 S12b' 206,492 PIX2LS PCNT 'Si^fit&%Sh ^ficR 0 

7 ^guc A«ic acid aataboligm; »• * ' w 

SCS(EBH 1..7^ 0'UEM2I19S "Screcnl* AT 40,3 SIZE 286,492 PIXELS TCtTT "Geneva",? COLOR 0<OiO' 

list. OFF fields nuBiber,C,F,Z,R,Ermtir,S,SESCRZPTaR,Ba^^ fOR H«*t7* 

*SCRSQT'l TYPE 0 REAimx: 'Screen 1* AT 40,2 SIZE 286,452 PIXELS* PCNT 'Helvetica ",265 GOIOR 0 
""■•^ * • ■^ ?:'l4p id xoetaboliam: • * . — m.^ w 

3F?r rrAfiCRBBM l^^TSn?*) 0 HSM)IN3 "Screen !• AT 40,2 SIZE 286,492 PIXELS PCNT "Geneva", 7 COLOR 0.0.0. 
;^^v !-F;iliBtvOPF fields iiamber,D,P,Z,R,Bmiy,S,lffiSCiaPIOR,BCPREQ,R?BND,RA^ TOR Rb'W 

c; : f:'.SQRESK 1 TYPE 0 HEADZMQ "Seree» 1" AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica', 265 COLOR 0 

• ;\t y ;r'Otto jgnayBia%ii' r., ; .r. . • r-* 
..c:] /.fECRHEN';! OTPS 0 ^aEADINS:." Screen 1" AT 40,2 SIZE 266,492 PIXELS FONT "Geneva",?* COLOR 0.0.0. 

.^z; . .list oji? f^idfe a^^ ' ' 

.^^Sqsm^l T3f»:0,:^Mmi© pixels FCNT ■HelyetiGB-,268 COLOR 0 

•v:; 0 HEADINB "Screen 1" AT 40,2 SIZE 286,492 PIXELS FOOT •Helvetica\26S COLOR 0 

^ ^ L? 'Str ess reoponsei* • .... * 

y;^ - ' ""fCS^-t TiP& Q HEApnO •Screen 1- » 40,2 SIZE 266,492 PIXELS PDNT "Geneva", 7 COLOR 0,0.0. 
i; ^ aiet OFF fields nuinber,D,rrZ»R,amcr,SrKBCRIPTOR,BGFR^ 



SCREEN 1 WPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492" PIXELS ^ fiONT "Helvetica", 265 omi'O 
? 'Structtealj' . v 

8CRHW 1 WPS 0 REM>B9S "Screen I" AT 40,2 SIZE 266,492 PIXELS FONT "Geneva", 7 COLOR 0.0.0 
list OTP fields. nuinber,D,F,2,R,airRy,-S,W£aaPTOR.BG^ r=Ik* 

..fiC^N 1 TYPE 0 BBADms -Screen !• AT 40;2 SIZE 286.492 PDffiLS FONT "Helvetica ",2 65 COLOR '0 
7 ; Oth er clones I * * 

SCREEN 1 TSfPE 0 5EAD1M3 "Screen 1" *AT 40,2 SIZE 286,492 PIXELS • FONT' "Geneva", 7. COLOR 6.0 0 
Ust OFF fields number, D,P,Z,R,a«ntSf,S,tBSCRIPTMl,BGFRSQ,RFErffi, RATIO, I SOR R='X' 

. 6CREEK i TSfPS 0 HBAD3NS "Scrte I" AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica ",2 65 COLOR 0 
y 'Clo paa'of mOmown functioni' 

SeiffiajlT^ HERDINB "Screen 1" AT 40,2 SIZE 286,492 PIXELS FOOT "Geneva",? COLOR •0,0;0, 
list OPP fields nw^,D,F,2,R,ENrRy,S,DESCRIPT0R,BGFTOQ,RPan),FATI0,^ IX3R R^'U' 

CO "Teat print -pxg" 

SET PR Pir O CT 

SET DSVICE TO SCREE27 

CLOSE DATABASES 

ER&SH TBUmiXB.nBF 

ER&SB 'TEUPNUM.DBF 

£3^ASE 'TQtPUB31Q,VSP 

SET M»PG1M TO 0 

CI2AR 

LOOP 
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S&P OaUJC OFF 
SET FiUm> OFF' 

x^uCi' SIDR E 0 TO NiaiSi «3P3ect 

:* STORE 0 'TO ZOQ 

' ^ STORE a .to .Baii.. . ^ ; 

j^^.^u DO WmXB .T, ' • 

: Mwth^ (Single). fiat , 



XF Bailed 
: . Bcrgen 1 off 

g Eofa jeqto' 

JgCRTB FOR.LobkBEobiect ^'O^^^^ ^= 

to,.M0T.FO6HD()^^^ 

LOOP. 

• amp 



OT*3 Jtatiry TO gearcbval'^ ' 
^^ ••I^-p entry, dbf- 

SET ElCftCT OFF 
SHT HASBW OFP 

^!i^*!2P "lesarlptor.abf • 
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VXOf 

SNDIP 

BBONSE 

flORB Satzy TO Searefaval 

CXiOSS PATABRflBfl * * * 

SRASB "Loo}eup descrlptor.dbf ■ 

SET GXftCT ON ' - 

SNUCF ' 

' : ;c iOg Babxy/TO Searchval: . 

-^^.i*Horthem analysis for exitry ' 
' . ^7? Seafdbval 

^ •.. ./ . 

,7^ 'aicer. Y to proceed?:: ; 
WAIT TO OSC ' 

IP.OPMR(OK)o»y' 
- scre en 1 off . 

_ ENDtP- ' 'iti 

* O0ld?R£S82CM'SU6Il0raQlB £t3k Ll^dJfyidbf 

7 'Ccopreasisig the Iiitoarles £ile aow;%«*. 

t6& * BgBrt GM[y;PoxBXSS*/Mac:Pox files tlitoariea. dbf 

' SET fi iVPSTY- £§F * / , 

SORT ON llbraxy^TO 'Conpressed llhtaries.dbf " 

* FOR cate red * 
' SET SAFETSf OM 

USE 'CoB^essed litcfarles-.dbf ' . . ^ 
I36L5TB FOR oitereda'O ' 

•SSL 

OCXJNT TO TOT 

mm B 1 
SMSoO : ' 

"IP HftMp, >o TOT 

.RACK • ■ ' ''^ ^ /"^ ^ : 

BW2Bi- 

LOOP' 

S3Z3IF 
60 MARKl . ' 
* y^Q*^ lifarazy TO TfB^A 
* gKIP 

STORE Libr ary TO ISSTB 

IF TEST A s TidSTl'ti 
B2DXr 

MARKl Q.i$aHKlVl 
LOOP ' 

* Korthezn an^Sysis 

7 * Doing the northezn sew. > 

SET TKLSi GN . . . 

USE ' Siiwt QuyiFcxaASBi/HaciPox filestolones.dbf*- 

SET SAFBiy OFF 

OOFY TO •HitB.dbf, FOR CDtzvr.seBrd!tyal 
SET SAFETT GN 
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• MftSlER. ANALYSIS 3/ VERSION 12-9-94 

SETP HAtK^CFP^^ . ' . . " • - . . 

ctaa ..... 

SBr-.ESTICE TO SCREar 

S!I0RE;NtlMB2R. TO INITIATS - - - - . . 

GO B OTTOM ■ 

STORE NtffdBSR .TO .TBRMINAflE 

gTOR g 0 TO ENTIRS 

STORE 0 TO CONDEN 

STORE /OTO.ANAL . , 

STORB^b TO'aiaTCH--^^- -^'"•■^^ -,>:^^^;. - < . . ,v v^.^.- .k... ■ - 

STOR E 0 .TO HM?lTOH - .v, . 

STOR E -0- TO QMMCH ' ' ' ■ ^ ' ' 

OTOR E 0 TO IMATOH . 

SM?«S 0 TO XMMCK 

STORE 0 TO PRINTDN 

STORE 0 ^ PG?p 

^. Profifram. : tester analysis, fint 

• pate..., J 12/ 9/gr4 

* - J^sioi- » FoxBASBf^ revision 1.10 
^ Notes,.,. « Format file Master analysis 

fl PDELS 54,261 GET aned STO^65536 FOOT -cSa™? ^5 n?^^.«! C<M^«d format- StZE 
e PIXELS Ii7;i26 GET EMM^^x£| 65S36roriT^SiL;i? '^.^^^/"""^er.-Sort/antiy, 
e-mms 135,126 SOT HMMraOTHfi Is536 PS l!!^ " ^IZE l5,fi2C0 

! "3'126 GET 0M2«rcH SlS| 111^ S .Sf^S^.'^f !!= Hcanologeufi- SIZE IS.l 



a S£S^ fP'"6. 5EP termiaate STYLE 0 TOOT 'G^^'- h sTiEik^n^^^ 
* BOP: Master analysis. fint-' 

READ ■ ^ ■ . . . 

IF ANALa9 
/CLEftR' ' 

^joss data&ases 
: brase'tempmaster.dbp 

S ^^g^-^^^^^^+ZM^c.-fox filesiclones.dbf • 

^^ EN 1 OFF 

RETORN 

ENDIF 

Clear 
7 INITIAIS 
7 TERMINATE 
? . CQNDEN 
7 .ANAL 
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ematdh . 
U3B "Uhique libraries /dbf* 

-^^^'e^^-^^ i'liJ»aH»/lii>rary,total,entered AT 0,0 
.USE -SniartGiy :PoxBAaE+/Mac : fox files tclones.dbf 

COPY STRUCTURE TO TEMPIiIB 
^.09E TEMPLIB 

'iP:a?nRE«i 

;; filesiClones.db^' 
, ENDIF 

; IP EOTIRE&2 • 
;USE "Utiigue libraries. dbf 

copy TO SELECTED TOR UPPER(i)«'y» . 
; USSSELBCTSD ' ' - •' ^ " • --^^^ -^-^ -^T v: --/T-- ^ .V. 

; STORE RSCCOUNTO TO -OTOKT 
:: MAKXnl 

* ..DO WKILE .T. 

IP MARfeSTOPit 
. OfiAR 

' 'E5C1T ' *: --' ■ • 

^ USE SELECTED 
; . GO MARK ' ^ 

> STORE library .TO THISCNE 

? 'COPYTOG * 
- . ?? THISOME 
: UBS TCMPLli'- ""^ vv;-.- 

S^fl^'S^^ «il^:Clones:dbf • W lit«^ ' ''''' 

LOO? 
EMDDO 
©IDIP 

-USB "firoarcGw:PoxBASB+./l&C!f<»c files iclonea.dbf 
CCUOT TO arARTOT ... Vr . . 

-copy STRUCTORE TO CTdPDESIG ' v -. ;-: -Vj . ..i'-e. ; . Lo:.r 

USE TEKPDSSIQ. 

.AND.. HraatchsO .AND. Onatch=0 .AND. IMATCH«0 
APPEND FROM TEMPLIB 
EMDIP 

- IF Emacchsi 

APPEND PRM raiPLIB FOR Ds'E* 
IF Hnotchol 

APPEND PROM TEMPLIB FOR Os'.fi* 

ENDIP ' " - . ■ ' 

IP OtDatchal 

APPEND PROM TOMPLIB FOR D««0* 
SNDIP 

IFgnatchgl 

APPEND FROM TEMPLIB POfR l>s'I' .OR.D»'X' .OR,D»*N» 
HNDCCF 

IP Xmatchnl 
. APPEND PROM TEMPLIB FOR Da<X* 

S^XF 
CCUOT TO ANAI/TOT 
Bet tfidk off 

DC CASE 
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.fjSCT.MEVlCB TO PRIOT 

r^^?^^ ^ "Shear stress HUVEC l-Clone ll^h *-v?» 

■ ^-iSS't? ^ iiSSJiiJ-J^^. 

^l^^^""^"^^ Analysis- STYLE 65536 TOOT COLOR 0,0,0,-1, 

9 



•1,-1 



?? TIMBO 

7 Clone- nunibere ' 

bhraagh ' 
-;?.? STR(TSR>aNATS, 6, 0) 
? rlliibraries; ' 
. IP STPIREbI 
? 'All^;llbrarie3* 



^^2P EMnRE=2 " 

MJ^l 
.--r.. DO WHILE .T. ■ 
. -j IF.MftRK>STOPIT 

- mOTF ■ ^ ^r.. . ... . . 

USB SELEC3ED - . • 

-I/;- GO MARK ■•' 
?. ' ... 

?? TRIM(libaame) 
:Sara2.MARK+l :ro MARK 

toop - ■ » ^ . ^ 

■ QB3D0 " 
. BJDIF 

? 'Mslfimacions! ' 

lP,Etetch=0 .AND. KEnatch=0 -AND. anatch=0 .AND. 
EHDIP 

IF anatcshsl 
?? 'fiacact, • 
WDXP 

IF ttnatc5h-l 
?? 'Human,* 



I2^ICH«0 



IF Ooatchsl 
?^ 'Other. sp. * 
ENDXF 

IF 3znotdh=l 
7? 'INCME* 



IF Xrcatchsl 
7? 'EST* 
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? ^Qandansed .format analysis ' 



Sorted ky .INrEIREST* 

'Arraixged lay DISTRIBUTIQN' 
? Arranged ity PUNCTICNi:. 

.'ENDIF ; '..-rri. ^-^ < > . -...i, . , . , . . , 

7 /Total clones r^resentedi' ' " ' ■ '"^^ ' ■ ' ■ ■^^^'•"'^ ^'Vi^n- 

-?? STR(STARIEOT,6,0) ^ ; 

~? 'Total clones analy?edj ' 
STR(ANMJIor/6,0) • 

7'^'1*= library.„ -,id = designation £ » distribution z = location r = function c « cer 

r- . ^ ^ ^ ^ 

= US&TEMPDESIO ■ 

SCREEN X TSfPE 0 HEADING "Screen !• AT 40,2 SIZE 286,432 PIXELS FOOT *Qeneva'\7 OQLOR 0,0,0, 
" DO CASS . ' 

^■CASE AKAIisl 
' * sort/nutttiber^ 

ssTHmsimm' ' y-, ■ . . 

SORT TO TEMPI CN' ENTRY, NUMBER 
• DO -CCMPRESSION niTttoer.PRG' 



SaS^ TO 7^3£P1 NUMBER 
OSB TOdPl 

list off fields naiiriber,L,D,P,Z,R,jC,aJTRY,S,riBSCRIFraR 

*Iist off fields number, L,D,P,S,R,C,E>TOy,S,IBSCRIPT0R,IiEIX3IH,RF^ 

CLOSE DATABASES 

ERASE TEMPI. rap 

ENDIF 

CASE ANALs2 

* sorn/D2SCRIPT0R 

SET HEADING ON 

♦SOOT TO TOiPl ON DESCRIPTOR, EWraYi NUMBER/ S for Da'S' .OR.Ifc'K' .oi.Do'O' •OR.D^'X' -OR.Da'l' 
•SOOT TO TMl CW einttr,nESCRIPTOR,NUMBER/S for Da'E'.OR.D-'H*'.0R.D-'O*,OR.D«'X' .OR.D«'I» 
SORT TO TEMPI ON a3TRy,START/S for D='E' .0R.D«'K' .0R,D='0» .OR.D='X' .OR.D-' 1' 
IF COKDENol 

DO "COMPRESSION entry. PRO* 



USE TEHPl 

list off fields number, L,D,P,Z,R,C,EKTRY,S,0ESCRIPTOR,LENE?ra,RFEND,INIT, I 
f^^CTg DATABASES 
ERASE TEMPI. DBF 

ENDGQ^ 
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'SET-lffiApijio m 



; s^t/lrterest • ; , , . 
'SET. aEaoitas cti ' " ' ' 



"S^irSSi!^^ °N EOTRy.HJMBER FOR I>0 
•g^CCWEFBSSICSN interest . PBG" 



: /ERASE OEMPl.c©? 



- • arrange/location ' - .^^ .mx:., . 

SET HERDING OM 
STORE 4 TO AMPLIFIE . ■ 



•DO .'Nprmai -subroutine !• 
-sENDIP 

? 'CVtoplasmic: ! . . 

~;g^C<?ggesBlon iQcatioa.pry* ^ 

..DO •Nottnal. auhroatine l« 
EKDIP , 

y 'Cyc'osXele&on; • . 

. ' locacion. prg" 

DO^Nonned. subroutine 1" 
EKDIP 

? 'Cell surface: ' 

DO "Conpression location. prg" 

DO •Monnal subroutine 1- 
SKDXF 

^ /^teacellular merahrane : ' 
^ "^^onpression location.prg* 



M "Norrol subroutine 1" 



? 'Mitochondrial:* 
^^^^ottpresQion location. prg" 



DO ■Nontal subroutine 1' 
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DO ^^aa^wreaaW 'location. prg" 



IX) ^iNoxnal flubroutind 1" 
? 'Otheri ' ' ' - 

SO TO; ON fl TOY.NIjMaER FIELDS RPBND, NUMBER, L,D,P, 2, K, C, ENTOY, S,DESCFaPTORrI£N^ 

r?;:ccKnBN=i":;' : , , . . 

DO '■Onrpfeaaion lbca£ionVinro^ ' ^ 
ELSE ^ ' 

DO; ;!Tjcpnal aubroutine !■ 
EJJDIPJ.' 
•Uti3diawn: ' 

TORT C3N BrmV.NUMBER FIELDS RFEato,NUMHER,L,D,P, Z,R,C,m?RY,S,DSSCiaPTOR,LEt^^ 
IF CONDENsl 

DO '';cpBp;^s$ioin location .pr^" 



DO /TNonoal subroutine I* 

ENDIF, • "".^ . -r 

IP OQNEENsl 

SOT SSy iCB. TO PRINTER 

SET PRINTS W 

EJECT 

DO "Output heading.prg' 
USB •Ana-Lysis .location.dbf • 
DO /Chreate bargraph.prg* 
SET .HBADIKO OFF 

' FUNCTIONAL CL^S TOTAL UNIQUE USW % TOTAL* 

LIST OFF FIELDS ZrUftME, CLONES, GQ^S^N:^, PERCENT, GRAPH 

DATABASES 
ERASE TE»?2.DQF 
SET HEADIKQ ON 

♦USE 'SirartGwiFoxBASS*/Mac:£ox files iTEKEMASTER.dbf" 
E?©IP 

CASE AMALsS ' 

♦ arrange/distriimtion ■*:'* 

^T HBADINQ CM 

STORE 3 TO At^LIFISR 

? •Cell/ciaaue specific distribution: • 

SORT CM ENrRY,NUMBER FIELDS RPEOT, NUMBER, L i Dj F, Z i R, C, EimiY, S, DESCRIPTOR, LmnW, INIT, I, CQb^^ 
IF OONDENbI . _ • 
DO "Coopression discrib.prg* 



DO "Normal subroutine 1" 

ENDIF 

7 'Non-specific distributionj • 

SORT ON anPRV^NUMBER FIELDS RFEND, NUMBER, L,D,F, 2, R,C,EtnW,S,DS5CRrPT0R,LE3IGTH,im 
IF CQNDENal 

PO "Coicaression distrib-prg" 



DO "Norxnal subroutina 1" 
ENDIF 

? 'Untoown distribution: • 

SORT EWTRY^NUMBER FIELDS RPSOT, NUMBER, L,D,F, 2, R,C|EJ7ZOT,S. DESCRIPTOR, UNGrTH. IN^ 
IP CONDEltel 

DO "Ccnprefisioin distrib.prg" 
ELSE 

DO "Nonral subroutine 1" 
£»OIF 

IF CONDENel 

SET DEVICE TO KONTOR 

SET PRUTTER W 
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.raSv'Analysls distribution, dbf 
flO>"Create baoigraph.prg." 
:5S?; HEM>IN3 OFF 

SST: HEADING Qr7 

^US^.'SnartGuy:PoxBASB+/toc:rox fnes:'CEMEMASTSR.dbf - 

CASS ma^i 

?vwrange/f\2ac tlon u 

,€ST HEADING ON 

BTORE 10 10 AMPLIFIER 

^f.r.. BINDING PROTEINS* 

L,^S^*2fJ?*'^^^®« ^ receptors 5' 

DO/^Cocpression functlon.pra" 
ELSE ' 

» -•Nonnal cubroutine 1" 
L^iS^^"*^^^^^ proteins: • 

DOn^Corrpression function .pro- 
ELSE 

W.'Noaaal Bubrbutine I" 
V 'Ligands >ani effectors t * 

ro^ecjqpression , function, prg- 

00 'Normal ,s>abroutine !• 
SNDIF. ! . - . 

2^2^?^ binding proteins: • 
2f ?Can5)«ssion function.prg- 



DO ■Normal subroutine 1^ 

•EJECT ^ - : y .^y^ r ^.^^ \ : 

?/'■:■ . CNCX3QENES' 

? 'General oncogenes: • 



DO •Normal subroutine 1" 
'IS^binding proteins 1 • 

Fim)S HFH^,«U^^,L,D,F,Z,R,C,Er^^ 

2?w^^*^^®^<^»^^^tion.prg« 



» •Nonnal subroutine 1" 
EfclDif 

? 'Viral elements I' 
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ELBS "* ' 

CX) "Normal subroutine 1" 
EUDIP. 

? rjctt^sei;:«3<3'' Phosphatases:' 

S^OC^S^'^^ 5TELDS OTEND,mBER,L,D,P,2,R,C,E2WRy,S,DB^^ 
do; ,r6dTOr0ssi6n' function 

DO •Kormal subroutine 1" 
EH)I? 

? 'Tumor-related antigens r 

Jlv CQNwJu^sl 

DO •Compreaalon function. prg' 



DO "NoOTal subroutine 1» 
*EJBCT 

7 ' PROTEIN SaCTHEnC MACHIN2RY PR0T2IN3' 

? 'Transcription and Nucleic Acid-binding proteins: » 

g*^Q^^^Y,NUMBER FIEUDS ilFQ©,NUMBSR,L,D,F,Z,R,C, SJTOY, S, DESCRIPTOR 
DO •Coanpreasion function.prg* 
DO •Nonnal subroutine !• 

ENDIP 

? 'Translation! • 

S'^IiSLJ^^'^'™^^ ra^/NtMER,L,D,F,Z,R,C,ENTRY,S,OESCRIPTOR,I£^ 
IF CGKnOTal 

DO "Ccoipresalon fuxiction.prg" 
ELSB 

DO •Normal subroutine !• 

SNDIF 

? 'RibOBooal proteins: ' 

fS^LSL?^^'^''^^^*^ FIELDS PJPEm,NUKBEa,L,D,F,2,R,C,ENTRy,S,DESCRIPTOR,La^ 

DO 'Costipression function. prg" 
EL£E 

DO "Nortnal sxibroutine 1" 
£NDIF 

7 'Protein 3proo«ssingt * 

MRTOTIMPKSf, NUMBER FIELDS RP©©, NUMBER, L,D,F,a,R,C,E3?IRy,S, DESCRIPTOR, ISNGmZ^ 
IF CQwDOlal 

DO 'Compression function .prg". 

DO *Nozsial subroutine 1' 
HNDIF 

7 • 

9 



? 'Ferrpprotelnsr 



SORT ON ENTRy,NDMBER FIELDS RPHTO, NUMBER, L.D,F, 2, R,C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I ^ 
IF CCNCEQIal 

DO "Compression functlon.prg" 

DO 'Nomal subroutine 1' 
ENDIF 

7 'Proteases and inhibitors:* 

SORT ON QJTRY^NUMBER FIELDS RPEND, NUMBER, L,D>,Z,R,C,ETniY,S,Di2SCRIPTOR, LENGTH, INIT, I, 
IF OGRDiSiN&l 

I» 'Conipression function.prg" 
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?;^drdsit'ii«i -phosphorylation: • 

-il^^^^^aalott functlon.pro" 

DO TOormsd* siibroutine !■ 
7 'Sugar •afetaHoilsai ' 

gp^ Ocygssion function. prg« 
DO^oraal.,aubroutine !• 
?;^^Affiiao-acid metabolisa: * 

00 »^oxshal subroutine !■ 
^j^cleic acid metabolistni • 

^^ConpreBaion 'function .prg- 

M: =*Nonnal subroutine !• 
S33DZF 

7 'Iiipid metabolism: « 

»Rr car mRY,igoMBEa fields rpend M3mht?r t n i? -7 « ^ 

DO-Compres8ion function.prg- /■^•"xo™ 



DO^onnal subroutine !• 
7 'Other enzymes I • 

^gCongpresfiion function .prg» 

130 'Nonnal subroutine 1° - 

EMM? ' - - ^ 

♦SJECT ' * • 

? » ^ ' 

7 ^ MlSCELtANEWS CAIBGORIES' 

7^ 'Stress 'response J • 

^^Cororesaion functioh.prg" 
TO "SJonnal subroutine I- 



7 •Structural;' 

SORT W amty, NUMBER FIELDS RFEND KUmrpb t n p u r..™.. 

DO^ ^ Conpression function ,prg- 

CO ■Woiroal subroutine !■ 
£I2DIP 

7 'Other clones j • 
M^^ConpresBion function. prg- 

66 



wo 95/20681 



PCr/US9S/01160 



TxXClonBS p£ unknown fune ticani * 

S^^L;S31??^^'*^™^'^^ RP2ND,NU^!3E5^,L,D.F,2,R,C,£tmty,S,DESCRIPTaR 
XTvCCQlDENal 

IXJ:7Cona>reBalon function .prg*' 



CO ."Kaons&l' subroutine 1" 

IF vCOMDEN»l - - -r,' . 
B7BCT . 

♦SBTvOBVICB TO PRINIER 
^SBf - PRIOT ON 
EOj^tjput heading .pry* 



1 



USE .!Analy8is;iunctian.db£' 
DO' "Create bargraph.prg" 
SET 'HEADIJ5G OFF. . . . 



SCREW 1 WPB .O HEA^ -Screen 1' AT 40,2 SIZS 296,492 PIXELS PiQNT -QfinBvaM2 COLOR 0,0,0 

^ 1^. ' " " 

; , ' TOTAL TOTAL NEW DIST 

? |rr - r FUtCTIONAL CLASS CLONES GEHBS GENES FU^TCTIOMAL CLASS' 

**• .'' 

*L2ST 0^ FlEtDS P, NAME, CLONES, GSNES, NEW, PERCENT, GRAPH, OCMPANV 
LIST OFF FIELDS 'P^MAME,CLQ^,G£2IES,Nm,FERCENT,GKAPK 
CLOSE DATABASES T 
ERASE TEMP2.DBF 
SEfT -.HBADINQ CN 

*PSE V*SitartGLy:Pa>tBAS34'/Maaifox fiXesjTEMPMASTER.dbl" 
CASE ANALbB 

DO "Subgroup suitneary 3,prg" 
ENDCASE 

DO. "Test print .prg" . 

SET PRD?r OFF : 

SET CEVICS TO ^CRESN ^ 

CLOSE DATABASES.. 

*5RASE TEMPLIB;DBP - ■ 

*ERASE TEMPNUM. DBF 

*ERASE .TQiFDBSXG»DBF- 

♦ERASE fgRTiBCTED^DBF 

CLEAR:. 

LOOP 

EHDDO 
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* COMPRSSSICN-SraRCfOTlffi FOR. ANALYSIS PXXSmsS 

CbpNT^.TO .TOT 

rePLACE ' AiL^RFEP© .WITH. 1 

C£):;WaiLE.S5C=0.-ROLL. 
PAQC. ' 

GOUITP TQ XOSZOOE 

COUOT to' NSflOERES FOR D='H' .OR^Ds'O' 
$H2sl 

map: 



GO MARn' 

STORE ENTOY TO TBSfllA 
SM V.O , 

DO W HHiE SWsO TEST 

S2crp 

STORE EfTIKir TO , TESTS 
IF TESTA = TESTS 

DtJP = DDPVl 
LOOP . . • 

GOMARKl. 

REPLACE RFEt© WITH DDP 
MARKl « HARKl+roP 
SW=1 
LOOP 

E^3DDb TEST 
LOOP 

SNDDO ROLL 

•GO TOP . . ^ 

STORE Z TO LOG ' 

USE^ "AnalyaiB location, dbf - - 

LOCATE FOR ZcXjOC 

R2PLACE CLONES WITK TOT 

REPLACE GIKES WITO UNIQDE 

RSPLACE NEW WITO NEWQENES 

USE TEMPI 

SORT ON RFEND*/D TO TEMP2 
USE TE14P2 

STR(miIQDE,5,0) 

II LH?^®^' * total of • 
?? STR(TC)T,5,0) 
' .clones* 

' V Coindidehce'-' " ^ ^ ^ 

li$t off fields nuniber,Rm©,L,D,P,2,R.C,smY,s,iJ2sc^ 

*SET PRIOT OFF 
CLOSE DATA3ASES 
ERASE ra^l.OBF 
ERASE TS2P2.DBF 
USB TEMPDESIG 
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* COHPICSSStON SOBKOOTINS'PORs^^ 
USB, TEMPI* 

KftRXl B 1 
SW2*'0 ' 

DO WHIIiS ' SN2aO ROLL/ ' 
IP HMKl ^s-TOT ' 
PACK 

6W2ol 

GO MARRi' ' 
DUP^^e 1 

STORE EWTRY TO' TESTA"'- 
SH « 0 

DO VKILS SWsO TBST ' 

SKP ' 

STORB ENTWf TO TESTS ' 
IF TES TA ss TESTS ^ 

Dup « rop+1 

LOOP 

• HMDIP 
GO MARKl 

RBFliACE RPEMD WITH DUP '^ 

MARKl - MARKl+DD? 

SW»1 

IXX>P . 

ENDDO TEST 

LOOP 

SMDDO ROUi ' 
*BROWSE 

•*SOT PRINTER ON 

SOW ON DATE TO TEMP2 

USE TEMP2' * 

?? STR(XWIQOE,4,0) 

?7 ' genes, for a total of* 

7? STR(TOT#4/0) 

?? clonea''- ■ . ^.v,. 

? 

7 / ■ 0 V Coincidence' 

COUNT TO P4 POR lm4 

IF 1»4>0 

? STR(P4,3,0} 

?? ' genes with priority a 4 (Secondary analysis t) ' 

list off field* nuinber,RPEb©,L,D,P, 2, R,CrEWIRV,S, DESCRIPTOR, L2MSTH,im^ for 3«4 
? 

SNDIF 

CC5UNT TO P3 PGR I«3 

IP P3>0 

? STR(Pa,3,0) 

?? ' genes with priority « 3 (Full insert 

list off fields number, R?EKD,L.D,F,2,R,C,ENrRv,s,DESCRlPTOR,LaOTH,INIT for 3o3 

COOOT TO P2 FOR 1=2. 

IP P2>0 

? STR{P2,3,0) 

?7 • genes with priority • 2 (Primary analysis ceasnplete t ) ' 

list off fields nu2nber,RFm),L.D,F,Z,R,C,Qmnf,e,DESCRIPTOR,LSNGTH,INlT for I«2 
ENDXF 

COUNT TO ?1 FOR lol 
IP Pl>0 
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^^^^^ for 1.1 

i;:'SEt'i^iOT "d^ .-^'..r--: - • ,0^ ^t..- [f^'f^^-^- 'ci^'rC.[l ... 

'^:;.ERASE , CTdPl . DBF 

'SmarcGi5riFcDcHASE+/Mac:fQx f lies: clones. dbf- 
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* C0KPRS5SZON SUBROUTZNB FOR i^OLVSIS PROGRAMS 
VSS: TZMEl 

MARKl = 1 - . . . ^c^. , J . ^ 

SW2«0 .v.; 

Ddi^HHniE' SW2aiO rHOLL; . , < 
.iz-iF-MaRKl >s TOT 

LOOP 



fiTORErEMPTRy TO TESTA 

fiw » a;; 

ObfWRXLS SWaO TEST 

. SKIP ^. 
s?X2RE-Ewray to tests 

; ^; IF ,..TESTA , cs TESTS 
q'lOTP B IX7P41 ; - 

GOrKftRKl . 

RBPIACE RPEND WITH DUP 
KARKl c KARKl-flXJP 

mil" : 

ijOOP ' ■ /c:-.; . 

ENDDO TEST - 
LOOP 

SNDDO BOLL 
♦BROWSE 

^SET PRINTER ON 

SORT OM MDMBER TO TEMP2 

USS'TEMP2 

??STR (UNIQUE, 4,0)^ r ; 

7? 'genes, for e total of » 

?<? STO(TOT,5,0)'- ' ; ^ . ^v . . . . 

\ " " V Coincidence' 

iist off. fields nuxnber#Iffim3,L#D.F,2,R,c,ErTO^,s,DEsaaPTaR,LQ^^ 

♦SET PRINT. OFF . .rw.>:<. 
CLOSE nmBASES 
ERASE TEMPI .MP 
ERASE TEMP2.DBF 

USE •SmartOuyiFoxBABBt/Macifooc files : clones. dbf" 
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J^jgOMPRESSIQN SU3R0OTIN2 FOR Al^^YSlS PROCSRftMS 

? 0CX3OT TO TOT 

: R^tAffi -AUi teEND WITH 1 

~-^^W>MARKlii>i'TOT^^''--^-". Tvr vsr C.y;^;^'^.': '^:>r^-)J:^ 

SACK Cii:.'^' 

^Ijh OOONT TO ONIQUS 

COOOT TOf NETOEN^ FOR D=*H< .DR.Ds'O* 

6N2el 

LOO? 

00 MARKl 
CUP - 1 

STQR3 ENTRY TO TESTA 

sw A 6 

DOWHIIiE SWsO TOST 
SKIP 

STOR E_ENrR Y TO TEST3 

IF raSTA = OESTB 

DEU5TS 

EUP = DOP+l 

LOOP 

¥NDIP 
GO M?^' 

r?EPlACE RPEND WITO DUP 

MARKi - >aPKi+rop 

LOOP 

HCCO TEST 

ENEDO ROLL 
GO TOP 

STORE R TO fum:: 
USE "Analysis function .dbf" 
LOCATE FOR PaFUNC 
'REPLACE CLONES WITH TOT 
REPLACE GENES WITH UNIQUE 
FBPIACE MEW WUH NEWGENES- 
USB TEMPI 

SORT CM RFEMD/D TO TE3iP2 

USE TE21P2 

SET HSADIKG W 

?? STR(UNIOTB/5,0) 

?? ' genes, far a total of ' 

?7 STR(TOT,5,0) 

?? ' clcnee' 

**» 



J, ' ... * V Coincidence' 

list Off .i€lds nurnber,RFEiro,L,0,F,Z,R,C.EtmiY,S,DESCRIPTOR,LEa^ 

*i2f^£f SlfdJ ''''''' ^ 

*SET PRINT GPP 
CLOSE DATABASES 
ERASE 'roiPl.nBP 
SRASE 7^2. VBB 
USE TBMPDESIG 
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n^COsOIWS&SIW SW^ PR0GHAK5 
REPLACE ALL RFEH) WIOK 1 



J- 



CO WBXLBvSW2bO roll 

0. COUNT- TO = UNIQUE 
• IJQOP. , , . . 1/1 r 

OTP a 1 

OTpRE. ENTRV ^TO :,TOSrA.t : y 

DO ^araXE fiWcO 'TEST Ig 

SKIPr.:* - /..^ nr-- ^^x.uCV/ 
STORBJ^Kif , TO - TSSTB " ' ■ f-' 

" DUP « EOP+l-V/. 
V L OQP\ - p^-'V T 

GO MMUQ ';.rr.l 

REPLACE. PFESQD: WZIH OTP ^ ■ 
KARKl a MARKl+DOP^ / 

LOGB^ 

K^^f )rx^ TEST 
XOOP 

maDo.Rociiu 

STORE: F, TO: KST 

OSE^'Analyeis distribution.dbf " : - ; ^ , . . . ■ 

LOCATE ^PTOPsMST 

REPLACE CLONES mTH -TOT. u.. ' , * - . . . : : v 

REPLACE C®<ES:WiaH DNIQCIE 
USE TEMPI 

«art on rfflsnd/d to IEb4P2 
USE TOMP2. . , 

?? ffIR {UNIQUE, 5/0) / r , . . , . 

'7? • oones, for a total of ' 

?7 STR(TQT,S,0) ; - . 

77 ' elonea* • 

7 • V Colxictdence' 

list off fields nmteteBFm^,h,U,T,Z,K,C,mrSY,S,UESCBIS^ 

*SET PRINT OFF 
CLOSE DATABASES 
ERASE CTdPl.OBF 
.ERASE TSMP2.CBF 

USE TOCTsaia 
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yCCmr TO TOT 

tUSPLACB ALL RmiD WIW 1 

IftRKl - 1 

:IX> IWILE SW2sO ROLL 
?. IF.iffiRKl >• TOT 

' 'PACK. 

K 0C3tIlfr:TO tJNIQtlE.. 

> LOOP, M ; . . 

: ENDIP - ' " 
CO MRRKl ' ' 

:STORE aOTY TO TESTA 
SW -c O 

DO WHILE SWsO TEST 
SKIP-; 

STOP5 Smiy TO' TESTS 
IF TESTA c TESTE 

DOP .s rop+1 , 
LOOP . 
EMD2F 
GO WARKl 

^?wm:b -kto©' wrra cop 

SW=1' 
LOOP- 

SNDOO TECT 
LOOP 

ENDDO - ROUi ' 

GO TO? 

0SE TEMPI . 

7? STR (UNIQUE, 5,0) 

il / genes, for a total of ' 

?? STR(TOT,5,0) 

?? ' clones' 

list rt^^ -i-a ^ ^ Coincidence' 

l:^st. Off , fields nurttoer,RFEND,L,aF,2,R,c,M«Ry,s.DESCRi^^ 
* aET- PRlOT DPP 
CLOSE DATAEASSS 
ERASE TEMPI. DBF 
USB TSJPDBSIG 
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COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

Mwua o 1 . 

SH230 

DO WHILE SW2aO ROLL 
IP MMUa >= TOT 
PACK 

COUNT TO XnnQOE 

SH2b1 

LOO? 

ENDIF 
GO MAJtKl 
DUP B 1 

STORE ENTRY* TO TOSTA 
5W B 0 

DO WKILS S^O TSST 
SKIP 

STORE ENTRY TO TESTS 

I? TfiESTJl B TESTE 

BELETS* 

DUP B tftTP+l 

LOOP 

SNDIP 
00 MAHKl 

REPLACE RFEND WITH OTP 
MARKl s MARKl<*-DaP 

LOOP 

ENHDO TEST 
LOOP 

mDDO ROLL 
*BROWSE 

*SET PRINTER CN 

SORT ON RFEJD/D, NUMBER TO TEMP2 
USE TEKP2 

REPLACE ALL START Wira RFEND/IDGENE*10000 

?? STRTOHQUBiS^O) 

7? • gea«s, for a total of ' 

77 STR(TOT,5,0) 

77 ' clones' 

? • Coincidence V v Clones/lOOOO' 

set heading off 



CLOSE DATABASES 
SRASS TEMPI. DBF 
ERASE TE^£?2.DBF 

USE *SmartGuy:FoxBASEt/Mac:fosc files: clones, dbf 



SCREE2T 1 T?PE 0 HEADING ■ 
list fields auniber, RFEND, 
♦SET PRINT OFF 
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^J^Q^SIQN eOBROUriNE FOR ANALYSIS PROGRAMS 

S^^.!S.E:?;:S:£:J::S;£:^^::S;£:s::S:£:5::S:S:^^o,...^ 



- OOOW r TO TOT 

MARKl-^* r"r^.^-- ?7vi -r o -""^.-.^VV------ A- ' ' " 

-XO mXL£ SM2eb ROLL 
V NlP:MARia TOT 
PACK 

OOCJNT TO UNIQUE 

loop 

ENDIF 
GO HARXl 
IX7P B 1 

STOm EQTRy TO lESTA 
5W e 0 

DO WHILE SW=0 TEST 
SKIP 

STORE EWERY TO TESTS 
IP TESTA = TESTS 

EUP -» DOP+l 
LOOP - 
ENDIF 
GO MARXi 

REPLACE RPSNt) MITH DUP 
MARKl a MARKX+EUP 

LOOP 

EMDOO TS?r 
LOOP 

WDDO ROLL 
*BRC»?SE 

*SET PRIOTER ON 

^RT OM RFEND/D, NUMBER TO TEMP2 
USB Ta4?2 

^^'^ ™™ RFEND/IDGENE*10000 
?7 STR(UKIQU3,5,0) 

22 15?"®*' » Wtal of • 
7? STR(TOT,5,0) 
77 • Clones' 

BeihSSJ'Sr^ VClenes/lOOOO. 

SCREEN 1 Tn>B 0 HEAfaiNG •Screen !• AT 40 2 stto 50c AQ-^ «^ 



CLOSE QATA3ASES 
ERASE TEMPI. DBF 
ERASE TBG'2.DBP 

USB "SinartGuy;FoxBASE+/Maci£ox filwi clones. dbf' 
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USE^TEKPl, . 

??;;• .Total of ' 
?? STR(TOr,4;X));. 

♦iiat off fields nuiiiber,L,D,F,2,,RrC,KWrRY,nESCRIP!ira 

Usi::;off 'fieIda. nuihber.'L;D,F;2;B;^ • • ' 'n--": 

CLOSE' EATABASES.7 " ' ' - - , , 
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*titekcan it«nu> version v 
Bet device^'to screen 

^™^?^S:f°'*^'^^'fo>^ files: clones. dbf 
STOEE LOPCMEO TO IMate 

STORE REGNO 0 TO eloneno '-^ ; 

STORE 6 TO Cfo)ser 

i?^^*"^ "^^^^ toaui.fint 

* Date,V'..r .1/11/95 

* wSfff*^-' ^SXHASB+/Macr-twii^^ 1.10 

* * • ' I4f«»seq menu. 

2 SSS^ 18,126 TO 77,365 2S479 CO 

•EOF: Lifeseg ntfittu.fiott: ' ^ -^--j- .^^o;.-. y-;..^^ , , ..^..^^^ . 

DO C3^ 

Caiss Chooserel 

S^SSSS^f~^*^'^'=»^ fil^s.-Oitput prograi»,M.ster analysis S.prg- 
•Ss^Sf^"'®^^'^'"* fil«B:Output progr«ns,Subtraction 2.prg" 

Ss^^lSif*^'^"-*^''* (Single) .prg. 

USE 'Iilhraries.dbf- 



CftiSB 'ChooseritS • ■ 

»-Sgrt0^roxEASE*/Mac:fox fileB:Output progxa,».see individual clone.prg- 

Ss;^^-f°'^^°'*'« files.Libr^es.Output prograi»s:Menu.pxy. 

SG^2QI 1 OFF 
RETORN 

LOOP 
EHDDO 
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ei,30 SAY ■Database Subset Analyais* STYLE 

? 'Clone numbers ' 
?? €yR(3NITlATB/6,0) 

7? STO (TERMINATE #6,0)' 
7"t ■Lihrariesi * 
IP . EOTIRE=1 . 
?f^.AAir libraries' 
EMDI F 

IF Q]TIRE=2 

.MAHKal . , . ■ 

'.DO mn£ .T.^* 

IP KARIOSTOPIT 

. EXIT ^' 

. EWDIF 
-,t;-USE SELECTED 
V s GO HARK 

r? TIOMdibneme) 
" STORE MABK+1 TO MARK 
LOOP 

' ENDPO 'V,. 
■£M)IF ■ r- •■-;>.,-■ " ; % ' 

'DesififMtlonsi - ' > f . .y 

IF &natchsO v.AKD. R[n&tch=a .AND. CtnatchisO 
??• 'Ail'. . r.. . ' 

ENDIP , . 
TP £2aatchctl 

n 'Exact,' _ . 

ENDIP- - - i . . < ^ 
IF Hrnatch=l 
?? 'Human, * 
2KDIP 

.IP CsnatctiRl 

?? 'Other t ' 

BNDIF -j^. * 

IF CCBTOCa?*! 

? Condensed format analysis* 

SNDIP 

IF AHAL-l 

?• 'Sorted hy NUMBER* 

ENDIP 

IF ANAIiz2 

? ' sorted isy iWTRy;c - 

ENDIF * 
IF ANA2/B3 

? 'Arranged ABUNDANCE* 

ENDIP 

IF ANAL»4 

? 'Sorted tty INTEREST' 

SroiP 

IP ANALaS 

? 'Arranffed LOCATION ' 

ENDIP 

IF AKAL«S 

? 'Arranged DlsroiBunaN' 

EHDIP 

IF ANALa? 

? 'Arrengsd by PUNCTICN' 



6 FONT "Geneva", 274 COLW 0,0,0,-1,-1,-1 
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USE ICMPl 
OODNT TO TOT 
?? ' Ottal of 
?? STR{TOT;4/0)' 

.•??; N'.i'? elonefil • j^*- ■■' ^ ' 

*ii8t df f ; fteida^' L, D, F, Z, R, C, HOTY , OESCRIPTOR, LENGTH, RFEND, IKIT, I 
liat'. off f ieraa nurtber, L, D, F, 2 , R, c, ESTIOT , descriptor 

GIJPSB lATABftaBS:^::. - 
ERASE ' ITSIPI'.' DBF,/ ' T 
USE ^TBiPDESlG ; ; \ X, \y 
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?? • total Of ' ^ ^ ^^^^ - --^'iv -.vK^:- . . , 
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. .... • 

♦Northern (singlQ), version 11-25-94 
close databases 

SET TALK OFF ' iUf-X.' -"^ covj-n; j.U"' \'\' u ':p-:\^' 'Mi , > Ck'hI c.4 'O ^.i ^ 

SET PRINT^O?? . 

SET EXACT''Q?r'' ' ^'-^-^ -i - .n.i., r^r.r, :-:r-at '.ri-.; - 

CLEAR 

fiTOR E ' n? I hs .. r . ' TO Obbject 

STORE 0 TO NUmb 

STORE 0 TO Zog ;> . Or^.' ^.}^ - \i^r,^■r. .-jr h'0!^= ; 

STORE 1 TO Bail ' 
DO WKEIiE 

* Program.: Northern (single) .fmt 

* Date : 8/ a/94 nor • ; n,. .j/-: ^• 

* Version. : Po>CBASE+/Mao, revision 1.10 

* Notes. •Fonnat'-'file'^'Northem '•*' ? ^ .-.^.-.jt? --.^.-Oi^v: 

SCREfitJ 1 TWE'O^HEADtki •itoeto !• AT 40, 2- SIZE 286^492 PIXECi tWT^'^Genevi'^ 12 COLOR O.OiO 
0 PIXELS 1S.,81^ 46,397; BTVlfi 28447,COLOR 0,0,-1,-25600,-1,-1 
0 PKELS 89,79 TO 192,422 STOiE 28447 COLOR 0,0,0, -^25600, -1,-1 
e PIXELS 115,98 SKf "Entry .ft: STYLE 65536 FOOT, ■GcnevaS 12, COLOR -0, 0, 0, -1, -1,-1 . 
@ PIXELS 115.173 GET Bobject STYLE 0 FCKT "Geneva M2 SIZE 15,142 COLOR 6;0. 0;-l,'i;-i' 
Q PIXELS .145, ;e9 SAY, "Description* STn-E ,65536 FOOT ''Genevar,12 OOLCaO, 0,0, -1,-1, -rl 
Q PIXELS 145,173 GET Dobject STYLE 0 PONT »QenevaM2 SIZE 15,241 COLOR 0,0,0,-1,-1,-1 
9 PIXELS 35,B9'SiAy'' Single Northern search : screen T. STYLE 65536 TOOT * Geneva" ,274 COLOR 0,0,- 
0 PIXELS 220,162 GET Bail STYLE 65536 PONT ■ChicagoM2 PICTURS •9*R Concinue;Ball out' SIZE 
0 PIXELS 175,98 SAY "Clone STYLE 65536 PONT •QenevaM2 COLOR. 0,0; 0,-1, -1,-1/ 
0 PIXELS 175,173 GET Nunto STYLE 0 FQWT ''GenBvaM2 SIZE 15,70 COLOR 0,0,0,-1,-1,-1 ' 
•0 PIXELS 80,152 SAY "Enter any ONE of the following:" STYLE t6553 6 FONT ("Geneva' i 12 COLOR -1, 

* EOF: Northism ^{single) • fiat - ' - , . 'Cjj.*, 

IP Bail«2'^- ' ■ ■•■r .1 .* i.---^^; t,-.<- • rr.> r-,. ■ 

CLEAR . ^ 

scre en 1 off ^ 

RSIURN' ....... 

USE "SnartGuyjFoxSASS*/MactPox.files:Loo)cup,dbf" i , 



IF Bobjecto' . • 

STORE UPPER{Eobject) to Eok^ect ' 

SETT EAPESTY OFF 

SORT O N ai try TO "Lookup entry, dbf* 

SET SAFETY ON . 

USE "Lookup entry ;cab£" 

lOCATE FOR Look»«Bobject 

IF •NOTtPOUNDO 

CLEAR ^. . 

LOOP 

ENDIF 

HROMSE 

STORE £btry:TO Seardhv;al 
CLOSE DAOTVSASSS 
ERA^ "Lookup* entry .dbf" 
EKDIP 



IF Dbbjecto' ' 
SET EXACT OFP 
SET SAFETY OFF 

SORT ON descriptor TO "Lookup descxiptor.dbf* 
SET SAFETY On 

USB "Lookup descriptor. dbf" 

LOCATE FOR UPPER(TRBl(descriptor))aUPFER(TRm{DObject)) 

IF .NOT.FOUMDO 

CLEAR 
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EKDIP , . 

^try K). 5earcbval . 
C10S2 EftHABASES • • " - ' --'Cr i v -v 

ERASE "LooJcup descrjptor.dh£« 
SEP EXaCT ON 
ENDIP 

IP Ntanboo-. .: . • , 

BROWSB - - ^ ^ ■ . i..=5..:j.,; : 

STORE Entry TO Seardhval . . < 
2WDIP 

CXEAR " ' ^ ' •* = ' - ^ 

?? Searchval ^ 
? 'Snter.y to, proceed' 

WAIT TO OK" • ' ' ' . . ^ 

CLEAR 

IP UPP2R(0K)<>'y' - ■ ' « 

screen 1 off 

RETURN 



* CmPRESSIOT:sUBROOTINE, FOR Library, dbf ^ , 

^R°'it'Sl5>'^ -Conpreeeed libraries. dbf- 

SET SAFETy ON ' ' * - ^ : 

TOE •Coitipreaaed libraries.dbf ■ 

DSLETE FOR entered-0 - • . 

PACK 

COORT TO 'TOT* . . w 

MARKl el 

SW2iaO 

DO WHILE SW2cO ROLL 

IF MARKl >= TOT ■ ' ' ^ . : ' - 

PACK 

SW2=1 

LOOP .. . 
EMDI? 
GO KARKl 

SJWE library TO TBCTA 
SKIP 

fftORB Library TO TESTB 
IP TESTA s TESTS 

nRLgns 

^RKl - 14ARK1+1 
^DO ROLL 

♦Northern analysis 
CX£AR 
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CLOSB DATABASES '■'•^Orlin- Wij--;, ^ ■: ,t ,i . ■'• f.^rfp^'*. i j:; 

SELECT 1 ^. ... . 

USE "Ooopressed libraries . dbf * ' " ~ 

STORE RSCOCXJMTO TO, Entries. > - i , ... , . 

USE •HitBi^dbf" ^:.-nm:bi .:j - ':;n • : . - v -.-^ • ^ • 

DO WHILE .:T;c^At. V .L- m*-;';' 1.- wir- ti-^.:::^ ^-r ui- ;ti'i ^j^n ' • ' , ^i- . - • 

SBMCT 1 • J ' - . . . . . 

IP Mar3c>Entrie9 ' .i*; , c-' nu-i-iii'^" ^-i 1 ;r..v: r-o^,; . ^ 

SXIT 

GO MARK J ■ 

STORE llbMLty'TOlJigge^ ' ■ = ^'iMM'-d-n , a*-.., 
SELECT 2 ^ 1 

COUIW TO 2og TOR -iiteary^ j.-i'-L.^. 

RBPLACB hits vdth 2og ^ ■ ' ■ ' ' 

Mar)teKark+l . 

LOOP 

HMDDO* 

6BLBCT 1 'i-v;;'^*. f . : j ■> . ;/> bM-^ . 

BKOWro FIEU7S .XlI^toiRV|LIB^IAME,E2?I'ER2D«HIT^ 

CLEAR 

? 'Enter Y to print: 

WAIT TO PRINSBT ' " ^' ' " ' *' ^ \; • - ' ' 

IP UPSER(PRINSBT) a 

SET FRIOT ON .* ■ ^ .M^ i.^r-. 

CLEAR - . . . . , 

EracT- ^ - • " ' ■ " = ' 

SCREEN 1 Tn>E 0 HEADING. :-£creen 1« KS ^0,2 SIZE 286,492 PIXELS. ,^FQNT -GenevaM4 CDUm 0,0,0 
? 'DATABASE ENTRIES MATCHING EMERy ' 

?? Searchval ■ ■•'•*•'■•/ h;' v. ' - i. T: ,-v; _ - -x.. 

? DATE O ... 

SCRBSH 1 TYI!E 0 ^KSADINO "Screen ir AT 40;2 SIZE 286,492 PIXELS FONT -Geneva*,? CXJLOR 0,0,0, 
LIST OFF FIELDS lilaraiy, libname, entered, bits ' 

? - - . _ ■ 

SELECT 2 , 

LIST OTP FTKT.PS ITOMBER, LIBRARY, D,S,F,Z,R,2NI?y 

SET TALK OFP ^ .... 

8CTPR1OT OFF 

ESDIP 

CLOSE DATABASES 
SET TALK OFF 
CLEAR 

DO 'Te st print .prg' 
REIUHN 
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library 
AOENtNBOl 
ADRBIOR01 
ADHENOTOI 
AMLfiNOTDI 

BMARNOns 
CARONOTOI 
CHAONOTQI 

RBRAhfTOI 

HMC1NOTD1 
HUVELPBOl 
HUVEWOB01 
HUVHSTBOi 
HYPCWOBOl 
KfDNNOTOI 
UVRNOTO1 
LUNGNOTDI 
MUSCNOT01 
OVIOND&OI 
PANCNOTOl 

prriMOToi 

PUCNOB01 

SPUJFET01 
Sf>tNNaTt)2 
ST0MNOT01 
8YNORAB01 
TBLWOTOI 
TCSTNOTOl 
THP1NOB01 
THPIPEBOt 
THP1PLB01 
U937NOT01 



iibname 
inflamed fidenoM 
Adrenal gland (0 
Adrenal 9tond 01 > 
AML blast eeUs (T) 
Bonemarrbw i . 
Bone marrow fT) 
Caitfac musde fT) 
CWn. hamsl«r ovary 
CerrieaJ ctroma 
Rb/oWaat. ATS 
Rbfoblast, AT30 
Rbrobiast AT 
Rbroblasl.:w5 
Rbrablaet. uv 30 
Rbroblasl ^ , , 
RbroblasU normal ' 
Masi ceO Una HMC-1 
'HUVECIFN,7NF,ti>s 
HUVEC conrrol 
HUVEC Shear stress 
Hypothetamua 
WdneyCD 
UwerfT) 

timgfO ' 5 
SkaJeiai muficfe (T) 
Oviduct ' • i 
Pancreas, normal 
Pituitary (r) 
PItulla/y (T) 
Placenta ' ' ' 
Small intestine 
BpleenrTiver. fard 
Spleen fl) 
'Stomach ^ 
Rhsum. synovium 
T4 Btymphoblast 
Tearia fT) 

THP-i coniml . 
THP phorbol 
THM phorbol US 
US37, monocytic leuk 



.... TABLE 



numbsrdbrary 

2304 U937NOT0t 

3240 HMC1NOT01 

3269 HMC1NOT01 

«93 HMC1NOT01 

8389 HMC1NOT01 

8139 HMC1NOT01 



d « f 2 r enify 
E H C C T HUMEF4B 
E H C C T HUMEFIB 
E H C C T HUMEFIB 
E H C C T HUMEFIB 
E H C C T HUMEFIB 
E H 0 C T HUMEFIB 



descriptor 
Elongation lactor l-bata 
Etengalion factor l^ta 
Etongaiion laetor l-beia 
Elongatton factor i-beta 
Elongation (actor i-baie 
aongation factor i-beta 



r^atariatari rfend 



0 
370 
371 
470 
327 
375 



773 
773 
773 
773 
773 
773 
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WHAT IS CLAIM^'lSl 'V I , f^-^ -.V r ^^^y^ l\ 

1. A method of analyzing a, specimen containing gene 
transcripts, said method comprising the steps of: 

(a) producing a library of biological sequences; 
^ , . (P^ generating a set of transcript sequences, where 
edctiivpf; the transcript sequences in said set .-is ^indicative 
or^ia' different; one of the biological sequenceis/'bf'''tne i^--^ 
libirary; \- ? V:^;. 

r>v (c) processing the transcript sequendes-^ in a * 
10 programmed computer in which a database of reference k 
transcript sequences indicative of referent^ biological ' - 
sequences ;is stored, to generate an identified sequence 
value for each of the transcript sequences, where each said 
identified sequence value is indicative of' a sequence t 
15 annotation and af ' degree of match between one of the f- 
transcript sequences and at least one of the- reference > 
transcript sequences; and : r 

processing each isaid identified sequence value to 
generate final data values indicative of a nuiaber of times 
20 each identified sequence value is present in the library. 

2- The method of claim 1, ^herein step (a) includes 
the steps of: • - . \ 

. pbtaining a mixture of mRNA;- ^ " ■ • I; 

. making cDNA copies of the mRNA; ^, \ 
25 isolating a representative population of clones 

transf ected with the :,cDNA ■ and producing therefrom' the I: 

library of biological sequences^ ! 

3. The method of claim 1, wherein the biological 
sequences are cDNA sequences. 



4. The method of claim l, wherein the biological 
sequences are RNA sequences, 

5. The method of claim 1, wherein the biological 
sequences are protein sequences. 
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6. The method of clain, i, wherein a first value 
saxd degree of »atch is indicative of an exa" JLh . 
second value of said degree of matoH • • ' 

^'^^^ or match is indicative of a 

non-exact match. 



^ o=„. r * "^'<i'¥"????4f? =P«=i»ens confining 

gene transcripts, s.id „ethpd co^'rislng-.- ■ ' 

».«.=r:. ™r ° "-^-f-^^-' —i-. - 1.^ 

(c) generating a second set • ^ 

(d) processing the second set of transcrinf = 

in said progra^oed computer- to generate ! ! =^'°^"'='= 

ia»ti«ed seguence values ^Hnown' aslL^^er rdeliT f 
aeguenoe values, where each of the fu^Lt identUied 
sequence values is indica^iv^ - ^"^^^^^^^e** 

" a degree of .atch -.tween W of th! MoT^ 

Of the second iihrarv and. at lelt'onel ,::::„r= 
sequences; ' rererence 

( e) processing^ each said : further id*»n<- 4 ^ 

value to generate further fln.i TT ^'^^"^'^^^^d sequence 
25 number of times each fur^L Ident °^ ^ 

present in the second itrary-^:^ 

(f) processing the final data values fr-^™ 
specimen and the further identified l: u nce Clu": llT 
th. second specimen to generate ratios of transcript 

30 seguences, each of said ratio values indicatiC. T 

t":™ '"^'^ °' — n the two 

the ^ population of mRNA transcripts from 

the biological specimen; °" 
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(b) identifying genes from which the mRNA was 
transcribed by a sequence-specific method; 

(c) determining numbers of mRNA transcripts 
corresponding to each of the genes; and 

5 (d) using the mRNA transcript numbers to determine 

the relative abundance of mRNA transcripts within the 
population of mRNA transcripts. 

9. A diagnostic method which comprises producing a 
gene transcript image, said method comprising the steps of: 
isolating a population of mRNA transcripts from a 
biological specimen; 

; ' (^) identifying genes from Which the mRNA was 
. transcribed by a se(juence-spiacific method; 

^ " 'V;;' ^{c),T-<ieterinih^^ jtrariscii^ipts 

15 " -corresponding^ to eadh ,of the genes;- and 

(d) using the mRNA transcript numbers to determine 
the relative abundance of mRNA transcripts within the 
population of mRNA transcripts, where data determining the 
relative abundance values of mRNA transcripts is the gene 

20 transcript image of the biological specimen. 



10. The method of claim 9, further comprising: 

(e) providing a set of standard normal and diseased 
gene transcript images; and 

(f) comparing the gene transcript image of the 

\5 biological specimen with the gene transcript images of step 
(e) to identify at least one of the standard gene 
transcript images which most closely approximate the gene 
transcript image of the biological specimen. 

11. The method of claim 9, wherein the biological 
0 specimen is biopsy tissue, sputum, blood or urine. 



12. A method of producing a gene transcript image, 
said method comprising the steps of 

(a) obtaining a mixture of mRNA; 

(b) making cDNA copies of the mRNA; 
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usxng sax. vector to transfect suitable host strain cells 
^^^-^'^^^'"^ ^'^^'^^^^ to grow into clones, 

. . -^d) -isolating- rrepf^^entative:;^ 6f 
recombinant clones; - r:. : 

represented within the .^epilation of clones as an 
indication of relatitM-sbWance,- ani3!t-4,:iiS,, 

order Of '"^ theiigSilve abundance in 

5r - 9- transcript 

13 . The xniBthod of > W^^^ , v .. . 

«i J.. ; ! " f r valso includiho si-e« 

of diagnosing disease by r ^ ' PV - i • . ^^^.-^^ • 

sn.«. ^^P^^f "5- ^teps^^a)^ thrpugh:.(gr dri^iological ^ 

:^:=i:::r::.^o:^^ 

a'te«t T '-''^^^^^^^ and p;oducing 

a test gene, transcript image by performing steps ^J'^^"^ 

V .. : :^omparing the ^est ; gene ^transcript image with 
reference sets of gene transcript- images Hnd 

:tdentifying at, least one- of the reference gene 

test 
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bio,»"' analyein, a library of 

biological sequences, said syste- including: 

wh.r '""^ receiving a set of transcript sequences 

Where , t..^^,,^^ indic!tiTo; a 

different one of the biological sequences of the libra^! 

=o«pu::r";:tir::i";::tr"^="^' -■^^^^^ 

m Which a database of reference transcript 
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sequ'ences^^^^ of reference biological sequences is ! 

-<s,tored,;, wherein the computer is programmed with software j 
^ ' ^::Ii>r^!!:g^neTatin^^ for each. ot.the. 

transcript sequences, wher« each said identified- sequence 
5 . value, is indicative of a sequence annotation and a degree 
_ of match between^ a diff eren^ biological j 

sequences o^C '"th^ '1 1 Bx^ai^^ c^if'^tlie ^reference ; 

transcript sequences, and for processing each said I 
u dv.n bidenti-f ied^ sequence value to .^generatei.vfinaibdata values 
10' indicative .of ^a^ number: of times each^' identified sequence 

value is present in the library. ; 

ct>.. 15^, uv^iy^ir^^^ also, including: .r ; : • n 

' ' " " library generation ^^m^^^^ for producing the"library of * 
bxolQgic^l sequences, and generating sai transcript ; 

15 .^sequences frpm said library.; ,v ; * iv:; '^^ 

16. ^The system of claim 15, wherein the library 
..generation means includes :. ^ n ^ 
h ijneans for ^obtaining a mixture of mRNA; , 
. Tneans for iiaking clDNA copies- of the nOWA; 
20 means for inserting the cDNA copies into cells and 

permitting the cells to grow into clones; ' 

means for isolating a representative population of the! 
clones and producing therefrom the library of, biological 
sequences. 
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SYBASE database Structurei 




Figure 1 



1/A 



wo 95/20681 



PCr/US95/01160 




1 



Figure 2 
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